| Converter |


A less conventional approach is the direct application of distance geometry techniques to generate a 3D structure for small to medium sized molecules (up to about 100 atoms). Distance geometry has traditionally been used as a tool for generating protein structures in solution in conjunction with distance constraints derived from NOE data. Its application to small molecules is perhaps less widely known, although some published examples exist.
The basic approach taken during the two- to three-dimensional conversion process is to assign distance bounds to all atom pairs of the molecule, together with planarity and chirality (using wedged bond information) constraints, and then to supply the bounds and constraints as input to a distance geometry program for generation of 3D coordinates. In this case, the bounds and constraints are supplied to the DG-II program package (Havel 1991) for generation of 3D coordinates. The particular sequence of steps employed here involves: 1) triangle bounds smoothing; 2) Havel's metrization algorithm for improved sampling; 3) embedding into four dimensions; and 4) four-dimensional minimization followed by three-dimensional minimization, which helps to satisfy complicated chirality constraints, if they exist. The 1991 paper by Havel (and references therein) contains more complete descriptions of the distance geometry algorithms.
After the 3D coordinates have been generated, the molecular structure is appended to the output database file. The structure can be read into the Insight II software with the Molecule/Get command.

1. Compute a complete set of bounds via a process known as bound smoothing.
To avoid local minima in the optimization step, it is sometimes desirable to compute 4D coordinates for use in Steps 3 and 4 (see Computation of Distances in Four Dimensions). Since all the mathematics on which these procedures are based carry over without change to higher dimensions, it is sufficient to restrict ourselves to three dimensions in what follows.
Describing Molecular Conformation
The fundamental premise on which distance geometry is based is that the set of accessible conformations can be aptly described by means of suitable distance and chirality constraints.1 The first are simply lower and upper bounds on the interatomic distances; the second include the handedness of the asymmetric centers in the molecule, together with the planarity of any sp2 hybridized centers. This description, called a distance geometry description, is essentially the computer analogue of a CPK model. A large body of mathematical results exists that can be used to better understand these descriptions, much of which can be found in the recent book by Crippen and Havel (1988).
Covalent Distance Constraints
The local covalent structure of a molecule is easily defined by distance constraints. Unless you are dealing with a highly strained ring system, it is sufficient to use only exact distance constraints in which the lower and upper bounds are equal. For example, the distances among covalently bonded pairs of atoms are determined with high precision by the order of the bond and the types of the atoms it connects. Similarly, the bond angles can usually be determined from the covalent structure. For fixed bond lengths there is a one-to-one relationship between the bond angle and the geminal distance, so that these distances can also be determined. This relation between the geminal distance d13 and the bond angle
is given explicitly by the law of cosines:
Vicinal Distance Constraints
Similarly, when the bonded and geminal distances are fixed, there is a one-to-one relation between the absolute value of the torsion angle
and the vicinal distance, given by:
Eq. 1
where l14 and u14 are the cis and trans limits on the 1,4 distance, here known as the tetrangle inequality limits. In particular, by setting all the vicinal distances across a double bond to the appropriate cis or trans limits, we can force the double bond to have any desired stereochemistry. We can even use inexact distance constraints to allow a specified amount of wobble in the torsion angle about the double bond.
Chirality Constraints
The chirality
1234 of an ordered quadruple of points numbered 1, 2, 3, 4 is given in terms of their Cartesian coordinates by the sign of the following determinant:
Eq. 2
It is well known that the absolute value of this determinant is six times the volume of the tetrahedron whose vertices are the given points, and hence the value of the determinant itself is known as the oriented volume. Note also that the sign of oriented volume changes whenever the x, y, or z coordinates are multiplied by -1, i.e., whenever the tetrahedron is reflected in a plane. This shows that the chirality of the quadruple of atoms bonded to an asymmetric carbon atom is capable of distinguishing between a molecule and its mirror image. In the special case of a planar quadruple of atoms, the volume and hence the chirality are zero.
Steric Distance Constraints
Since two atoms cannot be in nearly the same place at the same time, in order to obtain reasonable conformations it is also necessary to impose lower bound constraints on the distances between all pairs of atoms separated by more than three bonds. For the sake of simplicity, these lower bounds are generally set to the sum of suitable hard sphere radii, i.e., lij = ri + rj.
Triangle Bound Smoothing
This step of the EMBED algorithm predicts the range of values that the interatomic distances can assume in any conformation of the molecule that satisfies the given distance bounds. Hence these ranges, known as the Euclidean limits, are necessarily at least as tight as the given bounds. The calculation of the Euclidean limits is a difficult and unsolved problem, but it is possible to calculate loose approximations. The term "loose" means that these approximate limits are wider than the actual Euclidean limits, so that no conformations are excluded by them that were not excluded by the given bounds. In addition to its role in the EMBED algorithm, bound smoothing can sometimes detect inconsistencies in the bounds. Converter uses a program in the DGII package for bounds smoothing that employs triangle inequality.
Eq. 3
for all triples of atoms i, j, k. It follows that if dik
uik and djk
ujk, then:
Eq. 4
so if uij > uik + ujk, then uij > dij, and hence uij can be replaced by the upper limit uik + ujk on dij without eliminating any conformations that satisfy the constraints dik
uik and djk
ujk.
Eq. 5
it follows that if lij < lik - ujk, then we can replace lij by its lower limit lik - ujk. Note these two substitutions are valid even if no lower and/or upper bound is available on the distance dij, since then we effectively have lij = 0 and/or uij =
. Moreover, if such a lower limit exceeds the upper bound (or an upper limit) on the same distance, then we have essentially proved by contradiction that there exists no conformation that satisfies the bounds, i.e., that the bounds are geometrically inconsistent (as defined above). We call this a triangle inequality violation.
Metrization
With a sufficiently complete and precise set of distance constraints, reasonable coordinates can be obtained simply by choosing the distances independently from between their lower and upper limits with a uniform distribution. To obtain the widest possible sampling of conformation space, however, a procedure known as metrization is advisable (Havel 1990). The basic idea behind it is simple enough. Since Euclidean distances obey the triangle inequality, a better initial approximation should be obtained by taking this into account. Moreover, since the triangle inequality induces correlations among the distances, the structures that best fit these distances should exhibit large-scale, correlated differences among their parts and hence should be more diverse.
One problem with this procedure is that if the triangle inequality limits among N atoms must be recalculated from scratch each time one of the ca. N(N - 1) \xda 2 distances is set, the amount of time required overall increases as O(N5). Fortunately, if the distances are chosen in a certain order, it is possible to improve this to an O(N3) algorithm. This order is dictated by the use of a data structure known as a shortest-paths tree, which enables the limits from one atom to all the others to be efficiently recomputed after each change. Thus, the distances from one atom to all the other atoms must be set first, followed by the distances from the next atom to all the remaining atoms, and so on. With this order, the algorithm is known as prospective metrization. It is, however, also possible to compute the shortest-paths tree from each new atom to all the preceding atoms, fix these distances, and proceed to the next atom, and so on; in this case the algorithm is called retrospective metrization.
exp (
,x) is used and the exponent
is chosen so that the expected value of the distance is given by:
Eq. 6
lij and uij are the current limits on dij. Thus, as the dilation factor
, the expected value
dij
max (lij, uij) = uij, while as
0, the expected value approaches the geometric mean of the lower and upper limits,
. If
= 1, then
dij
= (lij + uij) \xda 2,
Embedding
The basic idea behind embedding is that if you can compute coordinates whose distances are a close fit to the trial distances, then, since the trial distances obey the given distance constraints, the distances calculated from these coordinates ought to at least come close. Of course, if you had to rely on optimization to compute these coordinates, you might as well just fit the distance constraints directly, but it turns out that matrix best-fits such as these can often be computed exactly by eigenvalue methods (Crippen and Havel 1978).
Eq. 7
Consider the eigenvalue decomposition of the N X N metric matrix:
Eq. 8
where W = [wi, ..., wN] are the eigenvectors and
= diag (
1, ...,
N) is a diagonal matrix of the eigenvalues in decreasing order. Since M can be written as the product of the rank three matrix P = [p1 - p0, ..., pN - p0] with its transpose, it itself is positive semidefinite of rank (at most) three, so that
1,
2,
3
0 and
4, ...,
N = 0. If you now let
1 \xda 2 be the 3 X 3 diagonal matrix of square roots diag (
11 \xda 2,
21 \xda 2,
31 \xda 2) and Q = [q1, ..., qN] =
1 \xda 2 [w1, w2, w3]
, then the 3D Cartesian coordinates given by qi for i = 1, ..., N and q0 = 0 have the same metric matrix as the points pi, i.e.:
Eq. 9
As a consequence:
i
j
0, you have:
Eq. 10
Thus, the distances obtained from both sets of coordinates are also the same, so the coordinates differ only by translation and rotation. The orthogonality of the eigenvectors, however, implies that the new coordinates qi are principal axis coordinates.
Eq. 11
As can be seen, all the atoms except the i-th appear symmetrically in this formula, and hence no one atom is preferred as a whole. In addition, the use of the centroid as the origin further improves the stability of the algorithm.
Eq. 12
As k
, the polynomial product Tk (M) v converges to an eigenvector w1, whose eigenvalue
1 is maximum in magnitude. The next-largest eigenvalue is then found by applying the same procedure to the deflated matrix:
Eq. 13
and so on. The overall time required by this procedure increases as only O(N2), on average.
Optimization of the Coordinates
To reduce the violations of the constraints by the embedded coordinates to an acceptable level, further optimization is necessary. This involves minimizing a function that measures the total violation of the constraints by the coordinates, which is called an error function. These minimizations can be done using any gradient method, e.g., conjugate gradients. Because the coordinates obtained by embedding are much better than random, multiple stages involving different error functions are not necessary to obtain good convergence.
The Error Function
The error function used by DGII is given by:
Eq. 14
where:
Eq. 15
Eq. 16
Eq. 17
Here, Aij, Bij, and Cijkm enforce the hard-sphere lower bounds, the remaining lower and upper bounds, and the chirality constraints, respectively. The parameters
,
, and
are the weights of the various terms relative to the upper distance bounds, while
(typically 1.0) prevents the upper bound errors from getting too large when small upper bounds are present. The function vijkm (P) is the oriented volume, as defined above, and lijkm, uijkm are the lower and upper limits on its value derived from the distance, chirality, and torsion angle constraints.
Computation of Distances in Four Dimensions
A specialized method of avoiding local minima is available with DGII: the computation of distances in four dimensions.
Eq. 18
where
is a dimensionality weight. The initial values of the fourth coordinates should be scaled so that drastic fluctuations in the dimensionality error are not observed. Though it may seem strange, this artifice enables many problems with the local chirality of the molecule to be fixed.
Minimization and Analysis
The coordinates obtained from this procedure are generally close to a minimum (hopefully a near global one) and should also be quite close to being 3D. To make them perfectly 3D while also making the residual violations of the constraints as small as possible, the coordinates are usually projected down into 3D (if necessary) and the error function minimized to a precise minimum via a conjugate-gradient procedure. The number of parameters to play with here is quite large, but the defaults generally work well enough for most problems so that a detailed discussion is not needed. The recommended number of iterations is 1000, with early termination only if the rms gradient norm falls below ~ 0.001.
1
The term restraints is used almost interchangeably with constraints; strictly speaking, constraints are geometric conditions, while restraints are the pseudo-potentials used during dynamics and minimization to enforce consistency with the constraints.