| QSAR |

Fragment constant descriptors are constants that relate the effect of substituents on a "reaction center" from one type of process to another. The basic idea is that similar changes in structure are likely to produce similar changes in reactivity, ionization, or binding. There are different constants corresponding to different effects. These are typically used to parameterize the Hammett (or Hammett-like) equation for some series of analogs. A comprehensive introduction is found in Hansch and Leo (1995). An example is:
Fragment constants descriptors
Eq. 20
where kx and kh are reaction rate constants for the substituents x and h, respectively;
is an electronic constant determined by an ionization constant; and
is fit to the set of analogs being studied. Often, multiple terms corresponding to different properties (electronic, steric, etc.) at different R-group positions are used. In this way measurements of ionization constants can be used to predict rate constants, once a scaling factor (
) is determined. In this example
measures the importance of electronic effects for the rate constant.
The default database currently contains the following types of constants. These come from Table VI-1 of Hansch (1979), except for the sterimol constants, which are calculated.
Sm, Sp
Electronic effect sigma meta and sigma para, respectively. Positive values correspond to electron withdrawal, negative ones with electron release. Sigma is generally not appropriate for ortho substituents because of steric interaction with the reaction center.
F, R
Decompositions of sigma para constant into an inductive (polar) part (F) and a resonance part (R) for the case when the substituent is conjugated with the reaction center, producing through-resonance effects.
pi
Hydrophobic character. Pi for substituent X is given by the difference of its logP from the logP for hydrogen.
HA
Hydrogen-bond acceptor.
HB
Hydrogen-bond donor.
MR
Molar refractivity as given by:
Eq. 21
where n is the refractive index, MW is the molecular weight, and d is the compound density.
Sterimol-L
Steric length parameter, measured along the substitution point bond axis.
Sterimol-B1 through B4
Steric distances perpendicular to the bond axis. These define a bounding box for the substituent and are numbered in ascending size order.
Sterimol-B5
The overall maximum steric distance perpendicular to the bond axis.
This table lists the conformational descriptors available in QSAR+: 
Conformational descriptors
This table lists the electronic descriptors available in QSAR+:
Electronic descriptors
Sum of atomic polarizabilities (Apol)
The sum of atomic polarizabilities (Apol) descriptor computes the sum of the atomic polarizabilities. The polarizabilities are calculated from the A coefficients used for molecular mechanics calculations:
Eq. 22
For more information, see Marsali and Gasteiger (1980); Hopfinger (1973).
Dipole moment (Dipole)
The dipole moment descriptor is a 3D electronic descriptor that indicates the strength and orientation behavior of a molecule in an electrostatic field. Both the magnitude and the components (X, Y, Z) of the dipole moment are calculated. It is estimated by utilizing partial atomic charges and atomic coordinates. Partial atomic charges are computed using the charge setup option in the QSAR control panel offering CHARMm charging rules, Gasteiger, CNDO2, and Del Re methods. The descriptor uses Debyes units.
Dipole properties have been correlated to longrange ligand-receptor recognition and subsequent binding.
For more information, see Bottcher (1952); Del Re (1963); Gasteiger (1978; 1980); Hopfinger (1973); Marsali (1980).
Highest occupied molecular orbital energy (HOMO)
The HOMO descriptor adds the energy (in electronvolts) of the HOMO for each model, calculated by the CNDO/2 method, to the study table.
HOMO (highest occupied molecular orbital) is the highest energy level in the molecule that contains electrons. It is crucially important in governing molecular reactivity and properties. When a molecule acts as a Lewis base (an electron-pair donor) in bond formation, the electrons are supplied from the molecule's HOMO. How readily this occurs is reflected in the energy of the HOMO. Molecules with high HOMOs are more able to donate their electrons and are hence relatively reactive compared to molecules with low-lying HOMOs; thus the HOMO descriptor should measure the nucleophilicity of a molecule.
For more information, see Fischer (1969); Pople (1970; 1967; 1965; 1966); Sichel (1968); Wiberg (1968).
Lowest unoccupied molecular orbital energy (LUMO)
The LUMO descriptor adds the energy (in electronvolts) of the LUMO for each model, calculated by the CNDO/2 method, to the study Table.
LUMO (lowest unoccupied molecular orbital) is the lowest energy level in the molecule that contains no electrons. It is important in governing molecular reactivity and properties.
When a molecule acts as a Lewis acid (an electron-pair acceptor) in bond formation, incoming electron pairs are received in its LUMO. Molecules with low-lying LUMOs are more able to accept electrons than those with high LUMOs; thus the LUMO descriptor should measure the electrophilicity of a molecule.
For more information, see Pople (1970; 1967; 1965; 1966); Fischer and Kollmar (1969); Sichel (1968); Wiberg (1968).
Superdelocalizability (Sr)
Superdelocalizability is an index of reactivity in aromatic hydrocarbons (AH), proposed by Fukui:
Therefore, considering the interaction of the MOs of the separated reactants gives us at least an estimate of the slope of the reaction coordinate. From this we make the additional assumptions that (1) this is a prediction of the height of the transition-state barrier at position r, and (2) the greatest interaction will occur at the site of largest orbital density, that is, largest c2.
The concept of delocalizability is introduced by the ej term. For low-lying levels, this energy is large and positive. We may interpret this as meaning the electrons are tightly held, that is, not very delocalizable. For the upper occupied states (especially HOMO), ej is much smaller, that is, the electrons in the higher-energy orbitals are less tightly bound, which means they are relatively delocalizable. Therefore the upper energy levels will dominate the Superdelocalizability term. Consequently, summing S for all atomic positions of a molecule gives a metric of electrophilicity, which may be used to predit relative reactivity in a series of molecules.
Quantitative values, such as the interaction energy calculated in Receptor for a generated receptor model, are available to use in QSAR+. By using Receptor data to develop a QSAR model, you can evaluate the goodness of fit between a candidate structure and a postulated pseudo-receptor. 
Receptor descriptors
When you have generated a receptor model and have aligned the models you want to study, you can proceed to build a QSAR using data from the receptor-structure iterations.
This table lists the receptor descriptors available in QSAR+:
IntraEnergy
The internal energy of a molecule as it sits in and is constrained by the receptor model you generated.
InterEnergy
The interaction energy of the molecule with the receptor. It is the sum of the van der Waals and electrostatic interactions. The more negative the value, the greater the interaction between the molecule and the receptor.
InterEleEnergy
The electrostatic interaction energy of the molecule with the receptor.
InterVDWEnergy
The van der Waals interaction energy of the molecule with the receptor.
MinIntraEnergy
The internal energy of a molecule as it sits in the receptor site without being subject to receptor model constraints. This value should always be less than or equal to IntraEnergy.
StrainEnergy
The difference in internal energy between the molecule minimized within the receptor model (IntraEnergy) and the molecule minimized without the receptor model (MinIntraEnergy).
Quantitative values calculated in the MOPAC application of the QUANTUM 1 card deck are available for use in QSAR+. These descriptors are the same as those of the same name described elsewhere in this chapter, except that the MOPAC descriptors are calculated using a semi-empirical method that is likely to generate more accurate values. For information on MOPAC, see the Cerius2 Quantum Mechanics -- Chemistry. 
Quantum mechanical descriptors
The following table lists the MOPAC descriptors available in QSAR+:
| Symbol | Description |
|---|---|
|
LUMO_MOPAC
|
Lowest occupied molecular orbital energy.
|
|
DIPOL_MOPAC
|
Dipole moment.
|
|
HOMO_MOPAC
|
Highest occupied molecular orbital energy.
|
|
Hf_MOPAC
|
Heat of formation.
|
All the graph-theoretic descriptors included here ultimately base their calculations on representation of molecular structures as graphs, where atoms are represented by vertices and covalent chemical bonds by edges.
Graph-theoretic descriptors
These descriptors fall into two categories:
Please refer to Terms on page 78 for explanations of graph-theoretic terms and symbols used in the descriptor definitions below.
|
Multiple bonds, if any, are treated as single edges in all descriptor definitions unless specifically mentioned otherwise.
|
Topological indices are 2D descriptors based on graph theory concepts (Kier and Hall 1976, 1986; Katritzky and Gordeeva 1993). These indices have been widely used in QSPR and in QSAR studies. They help to differentiate the molecules according mostly to their size, degree of branching, flexibility, and overall shape.
Topological descriptors
Wiener index (W)
The Wiener index is the sum of the chemical bonds existing between all pairs of heavy atoms in the molecule. In graph-theoretical terms: the sum of lengths of minimal paths between all pairs of vertices representing heavy atoms. This is equal to half the sum of all D-matrix entries (Wiener 1947, Müller et al. 1987):
Eq. 24
Zagreb index (Zagreb)
The Zagreb index is defined as the sum of the squares of vertex valencies (Bonchev 1983):
Eq. 25
Hosoya index (Z)
Let M be the number of edges in the graph. For any integer k, define p(k) to be the number of ways of choosing k non-adjacent edges from the graph. Note that p(k) is zero for k > [M/2], since there is no set of k non-adjacent edges in a graph of M edges if k > [M/2].
The Hosoya index is the sum of all (nonzero) p(k):
Eq. 26
with the convention that p(0) = 1 by definition.
It is a moderately easy exercise in graph theory to prove that the formula above can also be given in terms of the following recursion (implemented in C2·Diversity). Let G be the graph, of which the Hosoya index Z(G) is to be calculated. Remove an edge from G and denote the resulting graph by H. Again, remove the same edge from G, this time removing all the edges adjacent to it as well. Denote the resulting graph K. Then the following is always true:
Eq. 27
The recursion simplifies the given graph until one or both of H and K are empty graphs, in which case the index is defined as:
Eq. 28
There exists a handy shortcut for graphs that consist of disjoint subgraphs (see an example calculation of Z(benzene) below) -- if G consists of disjoint subgraphs H and K, then:
The index displayed in the study table is the natural logarithm of Z, to handle the rapid growth of the index with molecule size (Hosoya 1971, Rouvray 1987).
Kier & Hall molecular connectivity index (
This index, originally defined by Randic´ (1975), and as subsequently refined by Kier and Hall (1976) is a series of numbers designated by "order" and "subgraph type."
)
There are four subgraph types: Path, Cluster, Path/Cluster, and Chain. These types emphasize different aspects of atom connectivity within a molecule -- the amount of branching ring structures present and flexibility. Here we refer to these subgraph types as P, C, PC, and CH, respectively. They are defined as follows:
|
Given a connected subgraph G:
(i) If G contains a cycle it is of type CH (chain).
Otherwise:
Otherwise:
(iv) G is of type PC (Path/Cluster). That means the valencies greater than 2, equal to 2, and equal to 1, are all present. "Order" refers to the number of edges in a subgraph. The allowable orders are 0, 1,..., M (M - the number of edges in the entire graph)
Notes:
|
Molecular connectivity index of order n corresponding to subgraph type s is denoted by n
s.
Given an order n and a subgraph type s, one considers all connected subgraphs of type s consisting of n edges. For each vertex vi in a subgraph, its valence 
i (with respect to the entire graph) is calculated and the partial index nP corresponding to the given subgraph is found according to:
Eq. 30
(n = number of subgraph vertices).
,
Finally, the partial indices are summed over all connected subgraphs of the requested type s (Kier and Hall 1976, 1985):
Eq. 31
Example of the molecular connectivity index
If we calculate molecular connectivity indices for methane and the fluorinated methanes, the following results are obtained in the study table. There is one row for each molecule as usual, and some columns for each type of subgraph. In the Topological Descriptors control panel, subgraph orders from 0 to 3 are specified as the default, so we see CHI-0 through CHI-3 columns. Had the range been 0 to 4, we would have seen a CHI-4 column as well.
| CHI-0 | CHI-1 | CHI-2 | CHI-3_P | CHI-3_C | CHI-3_CH | |
|---|---|---|---|---|---|---|
|
CH
|
0
|
0
|
0
|
0
|
0
|
0
|
|
CH3F
|
2
|
1
|
0
|
0
|
0
|
0
|
|
CH2F2
|
2.7
|
1.414
|
0.707
|
0
|
0
|
0
|
|
CHF3
|
3.577
|
1.732
|
1.732
|
0
|
0.577
|
0
|
|
CF4
|
4.5
|
2
|
3
|
0
|
2
|
0
|
Order zero
Let us consider the order zero
indices, CHI-0
indices first, in the first column (CHI-0), which represent the simplest subdivision or subgraph: the set of vertices. The number of subgraphs of order zero is therefore equal to the number of skeletal atoms or vertices. Each vertex has a property
, which is the number of its electrons in sigma bonds to skeletal neighbors.
Eq. 32
Where:
= number of electrons in
bonds to all neighbors.
Eq. 33
The order zero
index is the sum of all vertex weights in the graph, that is, over all atoms in the skeleton.
Thus for methane, there is only one skeletal atom, C. It has four of its electrons in
bonds and is bonded to four H atoms, and therefore has
= 0 (that is, 4 - 4), and is assigned a
index of 0.
| atom | h |
|
| c |
|---|---|---|---|---|
|
C
|
3
|
4
|
1
|
1
|
|
F
|
0
|
1
|
1
|
1
|
Order Zero index for fluoromethane is:
|
2
| |||
| atom | h |
|
| c |
|---|---|---|---|---|
|
C
|
2
|
4
|
2
|
0.707
|
|
F
|
0
|
1
|
1
|
1
|
|
F
|
0
|
1
|
1
|
1
|
Order Zero index for difluoromethane is:
|
2.707
| |||
The zeroeth-order
index holds little structural information. Only the presence of the nearest neighbor to each atom is captured. In the series methane through tetrafluoromethane, we see an increase in CHI-0, which reflects the increasing size of the molecule skeleton.
Order one
Order one
index
indices are the graph edges, that is, the bonds that connect the skeletal atoms. We replace the atom
with the product of the
values of the vertices or atoms that form the edge or bond. Thus, the edge between vertices i and j is:
Eq. 35
and as before, we sum all the weights to obtain the first-order
index.
Leaving out hydrogens, the molecular graph of methane is a single point. It has no edges and therefore has first-order
index = 0.
Fluoromethane has one edge, representing the C-F bond.
| Edge |
![]()
|
j
| weight, c |
|---|---|---|---|
|
C-F
|
2
|
1
|
|
First Order index for fluoromethane is:
|
1
| ||
Difluoromethane has two edges.
| Edge |
![]()
|
![]()
| weight, c |
|---|---|---|---|
|
C-F
|
2
|
1
|
|
|
C-F
|
2
|
1
|
|
First Order index for difluoromethane is:
|
1.414
| ||
First-order
indices contain more structural information than zeroeth-order
indices. The first-order
index encodes the number of edges (bonds) in the molecular graph. Hence CHI - 1 increases throughout the series methane through tetrafluoromethane. Beyond this, the immediate bonding environment of an atom is captured in the edge weights: the weight of the carbon atom becomes smaller as it becomes more substituted. This reduces the rate of increase of CHI - 1 compared to CHI - 0 over the same series.
In an alicyclic compound containing A atoms, the number of skeletal bonds is:
Eq. 37
where P is called the number of paths of length 1. A "path of length 1" is a bond.
In a cyclic compound with R rings:
Eq. 38
Thus the number of first-order weights encodes the number of rings.
Second-order
For order two, we consider pairs of edges (bonds) in the molecular graph. Since methane has no bonds and fluoromethane has only one bond, CHI - 2 for methane and fluoromethane are zero. Difluoromethane has one path of length 2:
indices
Eq. 39
This is computed in a manner analagous to the lower-order indices, as a product of reciprocal square roots:
Eq. 40
Thus for difluoromethane:
| Path | Weight |
|---|---|
|
F-C-F
|
|
Trifluoromethane has three paths of length 2:
| Atom | h |
|
|
|---|---|---|---|
|
C
|
1
|
4
|
3
|
|
F
|
0
|
1
|
1
|
|
F
|
0
|
1
|
1
|
| 2P | Weight, c |
|---|---|
|
F-C-F
|
|
|
F-C-F
|
|
|
F-C-F
|
|
|
|
1.732
|
Third-order
None of the compounds have any paths of length 3, which would require three edges (that is, three bonds) to be connected, so the CHI - 3_P values for the series are all zero. On the other hand, 1,2-difluoroethane, had we included it, would have a CHI - 3_P index of 0.5.
indices
However, there is another kind of third-order subgraph called a cluster, which involves four skeletal atoms in a trigonal relationship. In this example, this structural motif appears only in trifluoromethane and tetrafluoromethane.
| Atom | h |
|
|
|---|---|---|---|
|
C
|
1
|
4
|
3
|
|
F
|
0
|
1
|
1
|
|
F
|
0
|
1
|
1
|
|
F
|
0
|
1
|
1
|
| 3p | Weight, c |
|---|---|
|
|
|
|
0.577
|
The smallest possible ring is three membered and, if there were any three-membered rings in our set, they would be captured by the CHI - 3_CH (CH for "chain", meaning "ring").
Fourth-order
Similarly, none of our compounds have any paths of length 4, which would require four connected edges, hence all values for the CHI - 4_P index are zero. Tetrafluoromethane contains a fourth-order cluster, however:
indices
| Atom | h |
|
|
|---|---|---|---|
|
C
|
1
|
4
|
3
|
|
F
|
0
|
1
|
1
|
|
F
|
0
|
1
|
1
|
|
F
|
0
|
1
|
1
|
|
F
|
0
|
1
|
1
|
| 4p | Weight, c |
|---|---|
|
CF4
|
|
|
|
0.5
|
The higher-order
indices are additive (because they are sums of weighting terms) and constitutive (because the size of the weights depends on atomic
values), representing the entire molecular graph.
Kier & Hall valence-modified connectivity index (
This index is a refinement of the molecular connectivity index (see page 60 for definitions) where a vertex subgraph valence
v)
is enhanced to
v to take into account electron configuration of the atom represented by the vertex:
Eq. 41
where Zv is the number of valence electrons in the atom, Z is its atomic number, and h is the number of hydrogens bound to it. This formula is designed to reproduce the unmodified molecular connectivity index for saturated hydrocarbons, for which
v =
. However,
v distinguishes between multiple and single bonds. The denominator introduces further distinction between element rows due to the presence of the atomic number Z (Kier and Hall 1976, 1985).
Kier & Hall subgraph count index (SC)
This is the number of subgraphs of a given type and order (Kier and Hall 1976). (See Kier & Hall molecular connectivity index (c) for definitions.)
Example of the Kier & Hall subgraph count index
|
Refers to the number of zero-order subgraphs in the molecular graph. The number of subgraphs of order zero is simply the number of skeletal atoms or vertices in the molecular graph.
The number of first-order subgraphs in the molecular graph, which is the number of edges that connect the vertices of the molecular graph. In other words, it is the number of bonds in the molecule.
The number of second-order subgraphs in the molecular graph, which is the number of pairs of connected edges. In other words, it is the number of paths of length 2.
Third-order indices
There are three types of third-order subgraph: Path, Cluster and Ring.
The number of third-order subgraphs in the molecular graph: the number of paths of length 3.
Counts the number of clusters.
Counts the number of rings or chains.
Kier's shape indices (
These indices compare the molecule graph with "minimal" and "maximal" graphs, where the meaning of "minimal" and "maximal" depends on the order n. This is intended to capture different aspects of the molecular shape.
n (n = 1, 2, 3))
Order 1:
1 encodes the count of atoms and the presence of cycles relative to the minimal and maximal graphs. For N vertices, the maximal graph includes edges between all vertex pairs. For the minimal graph a linear path of N - 1 edges connecting the vertices is taken.
The shape index of order 1 is then defined as:
Eq. 42
where P is the number of edges in the graph (edges are paths of length 1, hence the subscript on the
1), Pmax is the number of edges in the maximal graph -- namely N(N - 1)/2 -- and Pmin is the number of edges in the minimal graph -- namely N - 1.
By inserting the formulas for Pmax and Pmin, one obtains the implemented formula:
Eq. 43
Order 2:
2 encodes the branching. P, Pmin, and Pmax now denote the number of paths of length 2 in the corresponding graphs. The maximal graph is taken to be the star graph in which all atoms are adjacent to a common atom. Thus, Pmax = (N - 1) (N - 2)/2. The linear graph is again taken as the minimal graph, so Pmin = N - 2. Eq. 42 above thus yields:
Eq. 44
Order 3:
Eq. 42 is adjusted by another factor of 2 -- in the words of the index designer -- "to bring the values into rough equivalence with the other kappa values" (Kier 1990, Hall and Kier 1991):
Eq. 45
Kier's alpha-modified shape indices (
These indices are refinements of the shape index (see previous section) that take into consideration the contribution covalent radii and hybridization states make to the shape of the molecule. The indices 
n (n = 1, 2, 3))
n are defined by Eq. 43 - 45, with the atom count N replaced by the modified atom count N +
. The modifier
is defined as:
Eq. 46
where the summation is over all heavy atoms of the molecule. Here, ri is the radius of the ith heavy atom and rCsp3 is the radius of the sp3 carbon (taken to be 0.77 Å in this implementation). In this calculation the following atoms are considered to be heavy: C, N, O, F, P, Cl, Br, and I (Kier 1990, Hall and Kier 1991).
Molecular flexibility index (
This is a descriptor based on structural properties that restrict a molecule from being "infinitely flexible", the model for which is an endless chain of C(sp3) atoms. The structural features considered as preventing a molecule from attaining infinite flexibility are: (a) fewer atoms, (b) the presence of rings, (c) branching, and (d) the presence of atoms with covalent radii smaller than those of C(sp3). These features are encoded in the index as follows:
)
Eq. 47
where N = number of vertices (Hall and Kier 1991).
Balaban indices (JX and JY)
This is a highly discriminating descriptor, whose values do not substantially increase with molecule size and the number of rings present ( Balaban 1982, Balaban and Ivanciuc 1989). Its evaluation begins with the D-matrix modified as follows:
Eq. 48
where N is the number of vertices and i = 1, ... , N.
At this stage the contributions based on heteroatom electronegativities and heteroatom covalent radii are included by modifying the si values. The modifiers are two-parameter approximations of electronegativities and covalent radii relative to those of carbon. The exact formulas used in the index calculations are:
Eq. 49
Eq. 50
where i is the atomic number and Gi is the (short) periodic table group number. These modifiers are used only with nonmetals: B, C, N, O, F, Si, P, S, Cl, As, Se, Br, Te, and I. For other heteroatoms the values are set at X = Y = 1.
Given the values of X and/or Y for each vertex, the numbers si are adjusted as follows:
sai = X si (for the index JX)
sai = Y si (for the index JY)
and the result inserted in the final formula for the index:
Eq. 51
where J equals either JX or JY, depending on the modifier type used, M is the number of edges, and N is the number of vertices, and the sum is over all pairs (i, j) with adjacent vertices vi and vj.
|
The denominator M - N + 2 is really "number of cycles plus 1" (by the Euler formula) and serves as a normalization against the number of rings present in the molecule.
|
In this approach, molecules are viewed as structures that can be partitioned into subsets of elements that are in some sense equivalent. The notion of equivalence depends on the particular descriptor. Consider a partition of a set of N elements into k subsets each consisting of Nk elements: 
Information-content descriptors
equivalence class: 1 2 ... k
number of elements in each: N1 N2 ... Nk
N1 + N2 + ... + Nk = N
Given a partition P as above, we use the notation:
A probability distribution can be associated with the partition:
the probability for a randomly chosen element to belong to class i. This degree of uncertainty can be also expressed by the entropy:
Hi = - lb pi (lb is the base-2 logarithm).
The mean entropy of such a probability distribution is then:
Eq. 52
which, according to Shannon's statistical information theory (Bonchev 1983, and references therein), can be viewed as a measure of the mean quantity of information contained in each structure element (in bits per element).
The partition P, the probabilities pi and the mean quantity of information H form the pattern of calculation for all the information-theoretic descriptors.
Information of atomic composition index (IAC-mean, IAC-total)
The atoms in the molecule are partitioned into equivalence classes corresponding to their atomic numbers. The partition then yields the descriptor IAC-Mean as the mean quantity of information H as defined above.
The descriptor IAC-Total is defined as N X IAC-Mean, where N is the number of atoms in the molecule.
Information indices based on the A-matrix
The two information indices in this category are:
The A-matrix consists of zeros and ones, so the partitioning consists of two classes:
with M equal to the number of edges (thus 2M equals the number of ones in the A-matrix) and N equal to the number of vertices (N 2
- 2M is the number of zeros in the A-matrix).
Eq. 53
Vertex adjacency/magnitude
Each matrix element aij is now treated as an equivalence class of aij elements. In this case, each equivalence class consists of either one or zero elements, so the partition is (discarding the classes of zero elements):
P = 2M( 1, 1, ... , 1 ) (2M ones)
The index V_ADJ_mag is thus rather simple:
Eq. 54
Information indices based on the D-matrix
Two types of indicesare based on this matrix:
Information indices based on the E-matrix and the ED-matrix
The indices based on these matrices are:
Multigraph information content indices (IC, BIC, CIC, SIC)
To each vertex v, an unordered sequence of ordered pairs is assigned:
{ (m1, n1), (m2, n2), ... , (mk, nk) }, called a coordinate, such that:
k = the valence of the vertex (there is one ordered pair (mj, nj) per each neighboring vertex, vj), and for every j = 1, ..., k:
The index corresponding directly to this partition is the index IC ("Information Content").
The following indices are normalizations of IC:
ICmax = -N X (1/N) X lb(1/N) = lb(N)
and thus the CIC index is defined as:
This table lists the MSA descriptors available in QSAR+ 
Molecular shape analysis (MSA) descriptors
Common overlap steric volume (COSV)
The common volume between each individual molecule and the molecule selected as the reference compound. This is a measure of how similar in steric shape the analogs are to the shape reference.
Difference volume (DIFFV)
The difference between the volume of the individual molecule and the volume of the shape reference compound.
Common overlap volume ratio (Fo)
The common overlap steric volume descriptor divided by the volume of the individual molecule.
Non-common overlap steric volume (NCOSV)
The volume of the individual molecule and the common overlap steric volume.
Rms to shape reference (ShapeRMS)
Root mean square (rms) deviation between the individual molecule and the shape reference compound.
Volume of shape reference (SRVol)
The volume of the shape reference compound.
This table lists the spatial descriptors available in QSAR+: 
Spatial descriptors
Shadow indices
This set of geometric descriptors helps to characterize the shape of the molecules. The descriptors are calculated by projecting the molecular surface on three mutually perpendicular planes, XY, YZ, and XZ (Rohrbaugh and Jurs 1987). These descriptors depend not only on conformation but also on the orientation of the molecule. To calculate them, the molecules are first rotated to align the principal moments of inertia with the X, Y, and Z axes.
|
A total of 10 descriptors are calculated in this set:
1. Area of the molecular shadow in the XY plane (Sxy).
2. Area of the molecular shadow in the YZ plane (Syz).
3. Area of the molecular shadow in the XZ plane (Sxz).
4. Fraction of area of molecular shadow in the XY plane over area
of enclosing rectangle (Sxy,f).
5. Fraction of area of molecular shadow in the YZ plane over area
of enclosing rectangle (Syz,f).
6. Fraction of area of molecular shadow in the XZ plane over area
of enclosing rectangle (Sxz,f).
7. Length of molecule in the X dimension (Lx).
8. Length of molecule in the Y dimension (Ly).
9. Length of molecule in the Z dimension (Lz).
10. Ratio of largest to smallest dimension (
).
Jurs descriptors based on partial charges mapped on surface area
This set of descriptors (Stanton and Jurs 1990) combines shape and electronic information to characterize molecules. The descriptors are calculated by mapping atomic partial charges on solvent-accessible surface areas of individual atoms. A total of 30 different descriptors are included in the set:
24. Relative positive charge surface area: solvent-accessible surface
area of the most positive atom divided by descriptor 22 (RPCS).
25. Relative negative charge surface area: solvent-accessible surface
area of most negative atom divided by descriptor 23
(RNCS).
30. Total molecular solvent-accessible surface area (SASA).
Molecular surface area (Area)
The molecular surface area descriptor is a 3D spatial descriptor that describes the van der Waals area of a molecule. The molecular surface area determines the extent to which a molecule exposes itself to the external environment. This descriptor is related to binding, transport, and solubility.
Radius of gyration
The radius of gyration is calculated using the following equation:
Eq. 55
where N is the number of atoms and x, y, z are the atomic coordinates relative to the center of mass.
Density (Density)
A 3D spatial descriptor that is defined as the ratio of molecular weight to molecular volume. It has the units of g ml-1. The density reflects the types of atoms and how tightly they are packed in a molecule. Density can be related to transport and melt behavior.
Principal moment of inertia (PMI)
Calculates the principal moments of inertia about the principal axes of a molecule according to the following rules:
Molecular volume (Vm)
A 3D spatial descriptor that defines the molecular volume inside the contact surface. The molecular volume is calculated as a function of conformation. Molecular volume is related to binding and transport.
This table lists the structural descriptors available in QSAR+:
Structural descriptors
Number of rotatable bonds (Rotlbonds)
Counts the number of bonds in the current molecule having rotations that are considered to be meaningful for molecular mechanics. All terminal H atoms are ignored (for example, methyl groups are not considered rotatable).
This table lists the thermodynamic descriptors available in QSAR+:
Thermodynamic descriptors
AlogP, AlogP98, and molar refractivity (MolRef)
LogP (the octanol/water partition coefficient) and molar refractivity are molecular descriptors that can be used to relate chemical structure to observed chemical behavior. LogP is related to the hydrophobic character of the molecule. The molecular refractivity index of a substituent is a combined measure of its size and polarizability.
The QSAR+ descriptor ALogP and molar refractivity are calculated using the method described by Ghose & Crippen (1989). In this atom-based approach, each atom of the molecule is assigned to a particular class, with additive contributions to the total value of logP and molar refractivity.
For more information, see Leffler and Grunwald (1963).
The AlogP98 descriptor is an implementation of the atom-type-based AlogP method using the latest published set of parameters (Ghose et al. 1998).
Desolvation free energy for water (FH2O) and octanol (Foct)
Foct and FH2O are physiochemical properties associated with LFE models of a molecule. These properties have proven useful as molecular descriptors in structure-activity analyses. All LFE computations are based solely on the connectivity of the atoms in a molecule. LFE computations are not conformationally dependent.
Foct is the 1-octanol desolvation free energy and FH2O is the aqueous desolvation free energy derived from a hydration shell model developed by Hopfinger, where Foct and FH2O are in kcal mol-1.
QSAR calculates FH2O and Foct for each molecule by searching the molecule for recognizable substituent groups and their bonding patterns and summing the substituent constants contributions for each group that is present in the molecule.
For more information, see Hopfinger (1973; 1980) Pearlman (1980).
Heat of formation (Hf)
The enthalpy for forming a molecule from its constituent atoms, a measure of the relative thermal stability of a molecule. This descriptor is calculated using the MNDO semi-empirical molecular orbital method of Dewar. MNDO is the most rigorous quantum-chemical technique available in QSAR+ and has a wide range of applicability in conformational analysis, intermolecular modeling, and chemical reaction modeling. The atom limit of MNDO is 300 atoms or 300 atomic orbitals (whichever is less) per molecule. The atoms treated by MNDO are: H, B, C, N, O, F, Al, Si, P, S, and Cl.
For more information, see Dewar amd Thiele (1977a; 1977b).
PKas are calculated and the results displayed in the study table according to user-defined rules.
pKa descriptors (ACD Labs)
The pKa program, available separately from Advanced Chemistry Development (ACD), is needed for use of this descriptor. You can contact ACD through their website at www.acdlabs.com.
Molecular field analysis (MFA) evaluates the energy between a probe and a molecular model at a series of points defined by a rectangular or spherical grid. These energies may be added to the study table to form new columns headed according to the probe type. The new columns may be used as independent X variables in the generation of QSARs. For more information about working with MFA descriptors, see Chapter 9, Performing Molecular Field Analysis. 
Molecular field analysis (MFA) descriptors
For a theoretical description of receptor surface models, please see Cerius2 Hypothesis and Receptor Models, which touches briefly on functionality in the Receptor module called receptor surface analysis (RSA).
Receptor surface analysis (RSA) descriptors
If you have used Receptor, you may already be familiar with the idea of using the energy of interaction between a drug model and a receptor surface model to calculate a QSAR. For an example of this, run the demonstration log file Cerius2-Resources/ EXAMPLES/demos/DDW_receptordemo2.log. The energies of interaction between the receptor surface model and each molecular model are added to the study table as new columns, which you can use for generating QSARs. These energies may be added to the study table with the Receptor_energies descriptor.
An additional descriptor, Receptor_RSA, allows you to add the energy of interaction between each point on the receptor surface and each model to the study table and use these surface point energies to calculate a QSAR. Instead of one total number that is the sum of the interactions evaluated between each point on the surface and each molecular model, leading to one extra column in the study table, you now have available the energies at each surface point.
Depending on the size of the drug molecules, this is potentially a great number of surface points. Filtering methods are available to reduce the input to the study table, based on the variance of the energies at any point, correlation of the energies with activity data, or simply adding every nth point.
The technique resembles CoMFA but, instead of a rectangular grid, the points considered are taken from the receptor surface. Therefore they are probably more chemically relevant than a rectangular grid, because they exist on a surface that is shaped like a molecule, and even better, a surface constructed from a subset of active molecules.
After adding the receptor surface point energies to the study table, you may calculate a QSAR using the receptor surface energies and biological activities. Early tests indicate that if the genetic function algorithm (GFA) method is used, nonlinear terms must be included.