| X-PLOR 98.1 |


For crystallographic applications there are strong theoretical reasons (involving the fact that all atoms do not scatter equally) for preferring energy minimization algorithms that make some use of second derivative information (Tronrud 1992). Furthermore, the Powell energy minimization routine sometimes terminates with a significantly non-zero gradient for numerical reasons involving the step size used in the line search. In addition, it would be expected that more accurate and powerful minimization algorithms than the Powell routine would give better refined final models. For these reasons, the abnr energy minimization routine (Adopted Basis Newton Raphson) (Brooks et al. 1983) has been implemented in X-PLOR.
One cycle of minimization with the abnr routine is only slightly slower than one cycle of minimization with the Powell routine but a comparable level of convergence is usually achieved in fewer cycles. Refinement trials with both crystallographic and NMR data confirm that the abnr minimization routine gives final refined models with lower energies than the Powell minimization routine.
Implementation
The abnr energy minimization routine may be used instead of the Powell energy minimization for positional refinements including X-ray crystallographic or NMR data. The abnr energy minimization routine may also be used for the crystallographic refinement of individual atomic temperature factors.
The example scripts in /xtalrefine and /xtal_torsion make use of the abnr energy minimization method.

By eliminating bond lengths and angles as degrees of freedom from the molecule it is also possible to use much larger time steps for integrating the equations of motion with torsion angle dynamics than is possible with conventional molecular dynamics. The most useful consequence of the ability to run stable refinements with large time steps is that difficult refinement problems, which require simulated annealing at high temperatures in order to escape from incorrect conformations, can be tackled. For conventional molecular dynamics the computational cost of high temperature refinements is prohibitively high because very small time steps are required to control vibrations along the bond lengths and angles.
The application of torsion angle dynamics to X-ray crystallographic refinement has been shown to give somewhat improved results over conventional simulated annealing methods at a given temperature (Rice and Brünger 1994). More significantly, torsion angle refinements starting at very high temperatures give converged refinements for poor starting models -- a result that can not be achieved using conventional simulated annealing protocols confined to lower temperatures.
The code for the torsion angle dynamics algorithm in X-PLOR (Rice and Brünger 1994), released in interim form in X-PLOR 3.851, has now been optimized to give better performance. For a 120-amino-acid protein, the integration of the equations of motion is approximately 40% faster than in the original implementation.
Implementation
Example scripts for X-ray crystallographic refinement with torsion angle dynamics are found in the subdirectory /tutorial/xtal_torsion.
Specification of topology
The following parameters are supported for specifying the simulated annealing protocol:
w(h)(|Fobs(h)|-k|Fcalc(h)|)2
This target would give correct results if (i) the deviation between |Fobs(h)| and k|Fcalc(h) was Gaussian, (ii) the mean deviation was zero and (iii) the standard deviation of the Gaussian was independent of the parameters of the atomic model. This latter point is clearly untrue since the errors have a changing phase component which depends on the atomic model.
To overcome these fundamental difficulties, new targets for crystallographic refinement have been derived from first principles using maximum likelihood statistics (Pannu and Read 1996). The resulting target takes the form
wml(h) (|Fobs(h)|-<|Fobs(h)|>)2
A (Read 1986) taken from a set of cross-validation test data.Practical tests involving minimizations of misfit molecular models with the maximum likelihood targets and the conventional residual target show that better convergence (lower Rfree) and reduced bias (smaller difference between R and Rfree) is obtained with the maximum likelihood targets (Pannu and Read 1996). Tests in which a maximum likelihood target was used in conjunction with torsion angle dynamics showed that a greater radius of convergence could be achieved than with other refinement methods (Adams et al. 1997).
More recently, a new maximum likelihood target, the structure factor amplitude with Hendrickson-Lattman phase probability coefficients, has been developed and this target is also available in the X-PLOR program. This target makes optimal use of all the experimental information that is available since the phase probability coefficients correctly model the uncertainty in the experimentally determined values of the phase angle (that is., as determined from MIR or MAD data). Preliminary tests of the maximum likelihood target with phase probability information (Pannu and Read, unpublished results) give very promising results.
Implementation
The maximum likelihood targets are specified using the target keyword within the xrefin parameter block (example: target=MLF1). The three maximum likelihood targets currently supported are:
There are two other parameters within the xrefin target block, siga and mbins, which can alter the behavior of maximum likelihood refinement.
The siga parameter provides the ability to update the
A estimate that is used in the calculation of the maximum likelihood target. The siga parameter may be set to fix, next, or refi=<integer>. The siga fix option (the default) fixes the estimates for
A at the values used at the beginning of the refinement. To update the estimates for
A after some cycles of refinement the siga next option may be used. The siga refi = <integer> option is used to update the estimates for the
A values after the given number of structure factor calculations. This latter option should be used with caution because frequent updating sometimes causes an increase in Exref which can cause the line search to be abandoned in minimization.The mbins parameter sets the number of resolution bins used for estimation of the
A values. The default value of mbins for other calculations with X-PLOR that use this parameter (for example, resolution-dependent tabulation of R-values -- see page 164 of the X-PLOR 3.1 manual) is 8. If maximum likelihood refinement is performed the default value for this parameter automatically changes to the number of reflections divided by 200 or the number of cross-validation reflections divided by 20, whichever is the smaller. This default leads to a somewhat greater number of bins than the defaults originally suggested by Read and Pannu (the smaller of the number of reflections divided by 1000 and the number of cross-validation reflections divided by 50). Any value of mbins that is explicitly set in the X-PLOR script will override this default mode of operation.
As in refinements against the other targets, a weight parameter, wa, is needed to scale the gradients in the X-ray energy to the gradients in the chemical energy. The value of the wa can be estimated in the same way as for the other targets, by using a script (/tutorial/xtalrefine/check.inp) which carries out a short free dynamics run and then calculates the ratios of the chemical and X-ray energy gradients. Since the maximum likelihood refinements make use of (internally) normalized data, the correct value of wa is usually orders of magnitude smaller than the value that would be used for a refinement with the least-squares residual. Note that none of the maximum likelihood targets uses the phase weight parameter, wp, (the MLHL refinement target deals directly with phase probabilities, not any explicit value for a phase angle) so wp should either be set to zero or omitted from the refinement script.
Andersen thermal coupling
Background
Simulated annealing calculations with the X-PLOR program have previously used the Berendsen thermal coupling method to control the temperature of the system (see pages 130-131 of the X-PLOR 3.1 manual). Temperature control with the Berendsen method is obtained by adding a force to each atom that is proportional to the individual atomic velocity. The overall scale constant for this force depends on the difference between the current temperature of the system and the desired temperature.
In contrast to the Berendsen thermal coupling method, which simply modifies the trajectories of all atoms in the system by rescaling along their current paths, the Andersen thermal coupling involves a stochastic process which alters the velocities and directions of the selected atoms at each time step. The general effect of Andersen thermal coupling is to break up correlated motions involving groups of atoms. This behavior increases local sampling of the conformational space but inhibits more global changes in the protein model.
Implementation
The Andersen thermal coupling option may be invoked by setting two parameters within the velocity verlet command block.
Bulk solvent scattering correction
Background
It is common practice for X-ray crystallographers to exclude the very low resolution data from refinement and electron density map calculations because of a significant misfit between observed and calculated structure factors. This misfit occurs because the values of the low resolution structure factors are strongly affected by scattering from the bulk solvent in the crystal. Obviously it would be better to include all measured data in crystallographic calculations but to do so requires a suitable representation of the bulk solvent scattering (for a recent review see Badger, 1997).
This version of the X-PLOR program contains two alternative methods for modeling bulk solvent scattering. In the real-space approach, a model solvent density distribution is constructed on a grid around the protein model and the scattering from this density is obtained through the calculation of a set of solvent (partial) structure factors. These solvent structure factors can then be added to the protein structure factors in subsequent calculations. In the reciprocal-space approach, there is a direct re-scaling of calculated structure factors from the protein model in a way that implicitly accounts for the primary effects of the bulk solvent scattering. These methods are discussed in more detail below.
Real-space method
The X-PLOR program supports the construction of a variety of real-space bulk solvent models (Jiang and Brünger 1994). The first problem when using real-space models is to define the solvent-containing volume.
In the X-PLOR system, the solvent-containing space is defined by labeling grid points in the volume outside a protein space. This volume is defined by solvent-excluding radii centered on atomic co-ordinates. These solvent-excluding radii typically consist of the sum of the protein atomic radii plus a "probe" solvent radius. If this results in tiny holes within the protein, they can be eliminated by slightly over-expanding the solvent-excluding radii and then contracting back to the protein surface points to give the correct solvent volume.
Reciprocal-space method
The Babinet bulk solvent scattering correction method (Tronrud 1997) has also been included in the X-PLOR program. The underlying physical assumption of this method is that the density distribution of solvent atoms is the complement of the distribution of protein atoms. If a scale factor, Ksolv, is applied to account for the difference in mean protein and solvent scattering densities and a smoothing factor, Bsolv, is used to eliminate structural features from the protein, then Babinet's principle gives


Implementation
Real-space method
The real-space bulk solvent model is implemented through two main sets of parameters that are contained within the XREFIN...END command block.
The partial structure factors containing the solvent scattering contribution that are written by this script may be used in subsequent refinement calculations. The script /xtalrefine/slowcool_with_bulk_ml.inp illustrates a refinement protocol using partial structure factors calculated from a real-space bulk solvent model.
The theoretical value of the scale factor, Ksolv, is the average solvent density divided by the average protein density (typically 0.77 < Ksolv < 1.0) and the best value of Bsolv is usually in the range 100-400 Å2. According to Tronrud (1997) a value of about 280 Å2 is fairly universal as an optimal value.
An analytic solution for fitting these two scale constants (plus the overall scaling between observed and calculated structure factors) during the refinement is possible, but correlations between the various scale factors sometimes cause physically unrealistic values to occur. This analytic solution is not implemented in the X-PLOR program but an X-PLOR macro, /xtal_macro/bulksolvent, has been written to pre-determine an optimal value for Ksolv by carrying out trial calculations over a physically reasonable range of values.
Crystal structure deposition
Background
The Protein Data Bank currently uses a structure deposition procedure in which atomic coordinates and data are electronically submitted. The depositor of a structure is required to supply a variety of specified information in order to help validate and document the refinement of the model. The depositor can enter this information manually on a submission form or include it as part of the coordinate file, which allows it to be captured automatically. This latter option is preferred, since it relieves the depositor of the tedious task of entering the information manually (possibly inaccurately) and is a first step toward a highly automated system for deposition, validation, and publication of macromolecular structures. To assist this process, the X-PLOR program provides a facility for formatting the deposition information in the coordinate file of a refined model.
Implementation
Much of the information needed by the Protein Data Bank to create an informative and validated entry can be generated by an X-PLOR script and written into a coordinate file in a form suitable for automatic deposition.
The xtal_submit.inp script uses the pdbsubmission macro and writes the submission information in the PDB coordinate file. The mmcif_xtal_submit.inp script is similar but uses the mmcifsubmission macro and writes the information in the form of a rudimentary mmCIF file. The mmCIF deposition script and macro were written and provided by John Westbrook (Rutgers University). The mmCIF format is expected to become the preferred method for deposition with the Protein Data Bank in the future.
MAD phasing
This portion of text is adapted from program notes written by Axel T. Brünger.
Background
The following examples refer to a four-wavelength experiment but they are easily modified to accommodate between 2 and 5 wavelengths. Anomalous scatterer parameters are refined against the MAD data and phase probability distributions are computed. The algorithm consists of a combination of the Phillips and Hodgson (1980) method with a maximum-likelihood refinement using an error model similar to that of Terwilliger and Eisenberg (1987). Including cross-validation allows assessment of the quality of the refinement and phasing process.
Sequence of jobs
mad_merge.inp
3. Edit and run the mad_scale.inp file.
Macrocycles, cycles, and refinement steps
The refinement proceeds for the specified number of macrocycles ($macrocycle). Each macrocycle loops through the turned-on ($wave_<i>) LOCs in the specified sequence (#order) and carries out the specified number of cycles ($ncycle). Each cycle performs the specified number of refinement steps ($xstep, $bstep, $fstep, $fdstep).
LOCs
There are two LOCs for each wavelength except for the reference wavelength:
Phasing and refinement selections
Phase probability distributions are obtained for all wavelengths which have been measured at the reference wavelength regardless of completeness at the other wavelengths and of the completeness of the Bijvoet pairs. In other words, phase probability distributions will be produced for reflections for which only a subset of wavelengths and Bijvoet mates was observed. For the refinement and scaling targets, however, a subset of the reflections are used. The selection is stored in the array "w_sel". The example in mad_refine.inp selects those reflections for which all amplitudes satisfy the low amplitude, dispersive and anomalous difference and outlier cutoffs. The selection can be custom-tailored if necessary.
Cross-validation
The LOC R value, LOC value, and phasing power are computed for both the selected subset of reflections used to define the target ("working set") and the remaining reflections ("test set"). The test set can be used for cross-validation. According to Brünger, the cross-validated values are a more sensitive indicator for the quality of the refinement and the correctness of the refined sites.
Compressing the log file
Search for the word "TAB" in the X-PLOR log file. This produces a more concise summary of the progress of the refinement process. Look at the mad_refine.summary file to get a compact summary of the final refinement parameters and statistical indicators.
Assessing convergence
The program runs through macrocycles ($macrocycle), that is, the prescribed sequence of LOCs. For each LOC the program runs through a series of microcycles ($ncycle). Search for the word "shift" in the log file that X-PLOR produces to assess convergence.grep TAB mad_long.out | grep shift | grep "F_W1" | grep "1 SITE" | grep fp
Choice of refinement and phasing parameters
Convergence is affected by the number of refinement steps carried out ($xstep, $bstep, $fpstep, $fdpstep, $macrocycle, $ncycle). Accuracy of the phase probability distribution is affected by the integration step size ($phistep, in degrees). Depending on the resolution of the diffraction data, anisotropic B scaling (parameter $bscale) should be tried.
Lack-of-closure expressions
Individual wavelengths can be turned off and on for phasing and refinement by changing the $wave_1, $wave_2, ... flags. In general, two LOCs will be produced per wavelength: the F->F LOC and the Friedel(Fref)->Fref LOC, both relative to the reference wavelength. For the reference wavelength, only the Friedel(Fref)->Fref LOC is computed.
Absolute scale of structure factors and scattering factors
The probabilistic approach produces f' and f'' values that depend on the absolute scale of the structure factors. In addition, only differences between f' values are used in the probabilistic approach. If reliable estimates of f' and f'' values are available (for example, obtained by a theoretical computation of f'' at a "remote" wavelength with a small anomalous signal), it can be used to rescale and offset the scattering factors such that the resulting f' and f'' values will match the initial value for the specified wavelength $abs_scale_wave. The scattering factors and diffraction data are mapped as follows:
This file contains the refined f' and f'' values. Note that the probabilistic approach cannot produce absolute values for f' and f''. The scale of f' and f'' is dependent on the scale of the diffraction data (the scale factor that was applied to the reference wavelength by mad_scale.inp). In addition, f' is always relative to a baseline which has to be theoretically estimated from the observed fluorescence signal. If the "scaling" wavelength is specified, f' and f'' and offset and re-scaled to match the scaling wavelength. The refinement is performed independently for the F -> Fref and Friedel(F) -> Fref LOCs. This provides a way of assessing the convergence of the refinement. According to Brünger, the corresponding values should normally be close (within 10% of the maximum f' and f'' values).
1. The Fref-Fref LOC is not meaningful, thus no refined f', f'' values
are given.
3. The scaling wavelength does not have to be equal to the reference
wavelength.
This file contains the phase probability distributions (stored in arrays PA, PB, PC, PD), the corresponding phase centroids (stored in the phases of the FOBS array), and the corresponding figures-of-merit. This file is ready to be read by subsequent jobs to produce experimental electron density maps, perform density modification, phase combination, and refinement with or without phase information.
This file contains the refined coordinates of the anomalous sites. There is one coordinate set for each LOC, that is, each LOC is treated independently. The SEGIDs are encoded according to the particular LOC, for example:
This file lists the LOC R value, LOC value, phasing power, initial and final coordinates and other refined parameters. The overall figure-of-merit distribution is listed at the end.
Statistical indicators for the working set
The following notes can be used as guidelines to interpret the various statistical indicators:1. Phasing power values > 1.0: excellent LOC
2. R-Cullis < 0.6: excellent LOC;
R-Cullis < 0.9: usable LOC
How to obtain phases for the non-anomalous structure factor
Use X-PLOR to read the reflection file with centroid phases output from mad_refine.inp and use an X-PLOR script with the xrefin statement:
write reflection fobs sigma
1. At the current stage of the program, individual heavy atom directions cannot be fixed during refinement: the heavy atom is either refined or kept fixed. This may cause drifting of the heavy atom in cases where one of the directions is arbitrary (for example, for one heavy atom in P21, the y-coordinate is arbitrary).
4. Please make sure you run enough cycles to get good convergence
(see notes below).
7. The refinement can be very slow because of the generality of the
method.
Data conversion
Background
The crystallographic diffraction data that you may wish to use in calculations with the X-PLOR program could initially be in any one of a variety of formats, depending on the software that was used for the initial data processing or the software used for other calculations through which the data may have been passed.
Implementation
When reflection data are in a format other than the X-PLOR format, a READsf...END command block can be placed inside the REFLection...END command block to interpret the data.
An example input script, data_convert.inp, is provided in the tutorial/dataconversion subdirectory to illustrate reading a foreign reflection file.

NMR structure determination
Torsion angle dynamics for NMR structure determination
Background
The basic rationale for using torsion angle dynamics for structure determination using distance restraints are similar to those for X-ray crystallographic refinement -- elimination of unwanted degrees of freedom and the ability to achieve stable runs at very high temperature without incurring large computational cost. Applications of torsion angle dynamics to NMR structure determination were described by Stein et al. (1997). For moderately sized proteins (~100 amino acids) a simple protocol using torsion angle dynamics was found to have very high (>85%) success rates for structure determination.
In summary, this protocol consists of the following stages:
1. `Cooking' at 50,000K for 15ps with the energy constant for the
Van der Waals parameters scaled by 0.1.
2. Cooling to 1000K for 15ps with ramping of the Van der Waals
parameters to full scale.
3. Cooling to 300K for 6ps using conventional molecular dynamics.
4. 1000 steps of conjugate gradient minimization.
Structure determination using ambiguous restraints
This portion of text is adapted from program notes written by M. Nilges.
Introduction
ARIA (for Ambiguous Restraints for Iterative Assignment) is a fully automated iterative method that performs a series of tasks, tasks that are typically necessary in the calculation and refinement of structures from NOE data (Nilges et al. 1997).
Each data set is interpreted (calibrated, assigned, and cleaned) separately, before the different sets are merged to give one distance restraint list. Different types of data sets can be mixed; the operations performed by ARIA are determined by simple flags. Thus, one can use a list of peaks from an automatic peak picker, where calibration, assignment, and cleaning have to be performed. A single file (called project.xplor) needs to be edited to set all flags and parameters for the spectra assignment and the calculation.
data This directory contains a subdirectory for the sequence related data, and for each dataset used (NOEs, and other data such as dihedral angles, etc.). It also contains a list of the names of all assigned spectra to be merged (spectra_ass.list), which is created at present by hand, for example:
assign Contains input files and X-PLOR include files to perform the partial assignment, cleaning and merging.
protocol The read-only files of the simulated annealing protocol.
toppar The energy parameters used for the structure determination.
The subdirectories analysis, protocol and assign are (mostly) read-only and usually need not be modified. toppar also should not have to be modified, but it seems advantageous to keep exactly the parameters used with the project for documentation.
Floating chirality: Define the groups for which floating assignment is to be performed in the file setup_swap.xplor in data/sequence.
NOE data: The NOE data is organized by using a separate directory for each data set.
spectra_ass.list: Create a file in directory data that contains a list of all file names of assigned spectra (the $restraintfile variable in project.xplor, prefixed with "@@NEWIT".
The second section contains file names of the structure file, the template file, and the file root for all coordinate output files.
The third section contains parameters for each dataset used in the refinement.
The fourth section contains parameters for each iteration (any of the above parameters can be put here if they are to be varied from iteration to iteration).
The last section contains some modifications of parameters defined above in order to specify correct directories (via environment variables).
<ANALysis-restraints-statement>:==
The syntax required for direct coupling refinement is a set of statements of the form COUP E<coupling statement>L END
The syntax required for direct coupling refinement is a set of statements of the form CARB {<chemical-shift-statement>} END
The syntax required for direct 1H chemical shift refinement is a set of statements of the form PROTONSHIFTS {proton-shift-statement} END
The technology of the calculation of NOE intensities and the gradients with respect to the atomic coordinates have improved over the last few years. The method that is implemented in X-PLOR is based on the ideas of matrix doubling and Gaussian quadrature (Yip 1993). This present implementation is significantly faster than the earlier implementation based on Yip and Case (1989). To date, accelerations of one to two orders of magnitudes have been obtained, depending on the system studied.
The recent interest in structure determination of larger macromolecules
necessitates the use of triple labeling (H-2, C-13 and N-15) techniques.
With triple labeling the number of measurable NOE interactions which are
available for the structure refinement has dramatically decreased. For
structures in the size range already accessible by NMR methods, the pressure
for more accurately determined models requires new type of structural information
to be included in the NMR refinement process. In recent years the invention
and use of two important classes of additional angular information has
been emerging: the cross-correlated dipole-dipole relaxation rates (Reif
et al, 1997, and Griesinger et al. in press) and the residual dipolar couplings
(Tjandra et al, 1997)
In the following we will describe the X-PLOR commands necessary to carry
out direct structure refinement against these two classes of restraints.
An account of the basic principles of NMR structure determination
with information from cross-correlated dipole-dipole relaxation rates is
available in the paper by Reif et al. (1997). The experimental data
leads to determinations of the angles between bond vectors within the molecule.
This version of the X-PLOR program is able to include this information
in the structure refinement.
The syntax for carrying out direct structure refinement using cross-correlated
dipole-dipole relaxation rates is a parameter block of the form
XCDDipole <xcddipole-statement> END
with
<XCDDipole-statement>:==
During the refinement the appropriate energy flag should be included:
Implementation
The new method is implemented as a new option in the relaxation/cutoff parameter group. Within the mode specifier, the new keyword MATD is added as an option.
Structure refinement using additional angular information
Structure refinement using cross-correlated dipole-dipole relaxation
rates
flags include xcdd end