| Property Prediction |


Topic
Reference
Loading forcefields
The Cerius2 Simulation Tools book
Preparing conformers for flexible molecules
The Cerius2 Conformational Search and Analysis book

Step 1: Setup
Before you start, you must load and configure an appropriate Cerius2 forcefield. You must then prepare molecules to include in the asymmetric unit of the crystal structure being studied, and appropriately configure prediction procedure parameters and options. For details, see "Setting up" on page 129.
Step 2: Monte Carlo packing simulation
The Monte Carlo packing simulation searches each specified space group for the most probable crystal structures. For details, see "Monte Carlo packing simulation" on page 138.
Trajectory files generated by the Monte Carlo simulation for each space group contain large numbers of unoptimized structures; those with the lowest energy are the most likely to occur experimentally. At the local minima of the energy hyper surface explored during the simulation, clusters of many very similar structures (of which only the lowest energy representative is significant) are expected.
The low-energy structures output by cluster analysis are still energetically unrefined and require optimization with respect to all degrees of freedom.
Step 5: Cluster minimized structures
Initially distinct structures may become very similar, or even identical, during minimization. You should remove such redundant structures from the trajectory files produced by energy minimization by running a final cluster analysis. The remaining structures are saved in an output trajectory file.
Having performed steps 2 through 5 for each of a number of space groups, you are left with several output files (one per space group) containing optimized structures ready for final analysis and model extraction. However, it is possible that structures in the output from different space groups are identical. You may want to merge the output trajectory files and perform a final cluster analysis to remove such redundant structures. See "Merging trajectory files" on page 155. For more detailed information on trajectory files, please see Appendix B, "File Formats".
After each polymorph run, the resulting trajectory file can be analyzed using the Analysis item on the POLYMORPH card. Please see "Trajectory file analysis and model extraction" on page 149 for more information.
It is recommended that you confirm the accuracy of the polymorphs obtained by the prediction procedures by performing one or more reliability checks on them. See "Reliability checking" on page 156.

General Methodology
Setting up
Before you start a polymorph prediction run, you must appropriately configure a Cerius2 forcefield and specify and prepare models of all molecules in the asymmetric unit of the crystal. Once you have defined the asymmetric unit, you can initiate further steps in the prediction sequence (Monte Carlo packing simulation, cluster analysis, minimization, and final clustering) as a complete, automated sequence or as individual steps. Either way, you should set parameters and options for each step prior to initiating it.
Configuring forcefields
You should use a forcefield that is appropriate for the atom types (that is, the elements and their hybridization, as well as the local geometry) in the structure. The latest version of the Dreiding forcefield, Dreiding2.21, is the most appropriate for molecular crystals without metal atoms and is recommended for Polymorph Predictor runs.
Preparing models
You should prepare and specify models that will be included in the asymmetric unit of the crystal structure being studied (including the number of times the model should appear in the unit) using the Polymorph Run control panel.
Each model you specify for inclusion in the asymmetric unit should be energetically minimized and have charges calculated and assigned to its atoms before polymorph prediction begins.
1. Perform a Gaussian optimization of the molecules in the asymmetric
unit, preferably with the 6-31G** basis set.
2. Obtain Gaussian ESP charges for all atoms.
The Monte Carlo packing simulation treats all molecules as rigid structures. For flexible molecules, you should prepare conformers corresponding to local energy minima of the conformational space using Conformer Search and Conformer Analysis. After preparing the conformers, perform the polymorph prediction procedure on each one. (See the Cerius2 Conformer Search and Analysis book.)
Simulation parameters
The Monte Carlo packing simulation, cluster analysis, and energy minimization processes each have a number of associated options and parameters that affect their operation. Unless you want to use default values, you should access the preferences panels that contain the controls for these parameters, and set them appropriately prior to initiating any of these procedures.
Predicting Polymorphs
Polymorph Predictor limitations
Flexible molecules
Polymorph Predictor treats all molecules as rigid structures. For flexible molecules, conformers corresponding to local energy minima of the conformational space must be prepared and the polymorph prediction procedure performed on each one.
Polymorph Predictor calculates likely crystal structures with forcefield technology. Many different forcefields are available, each optimized for different elements and structure types. The accuracy of a prediction run is therefore related to the suitability of the forcefield for the crystal being studied.
Prediction time increases in proportion to the number of:
Intramolecular symmetry
Polymorph Predictor assumes no intramolecular symmetry -- there should be no bonds between the asymmetric unit and any symmetry copies of the asymmetric unit.
Running a prediction
You can initiate a complete polymorph prediction sequence or you can launch each step manually from the Polymorph Run control panel. The following steps are performed for each space group to be searched during the Monte Carlo simulation (please see the online help for more information on the control panels):
Restarting interrupted prediction procedures
In addition to the trajectory files that are generated during each procedure, restart and summary files (with the same names as the trajectory file, but with extensions of .rst and .summary
rather than .pmp) are periodically written. If the run completes successfully the restart file is
deleted, but not the summary file.
The molecules to include in the asymmetric unit of the crystal structure you are studying are defined (by model number) in the lower portion of the Polymorph Run control panel. You can also specify that multiple copies of a model be incorporated into the asymmetric unit. The content of each model is treated as a rigid unit during the Monte Carlo simulation.
To run a complete polymorph prediction sequence
1. Load and/or sketch molecules that you want to include in the
asymmetric unit of the crystal system you want to study.
| If you use the OFF forcefield on a non-periodic model after selecting the Ewald method, the method will automatically switch to spline. |
5. Open the Polymorph Run control panel by choosing Run from
the POLYMORPH PREDICTOR card.
b. Set the parameters and options on the panel and its two subpanels:
Polymorph MC Parameters and Polymorph MC
Output. See "Setting Monte Carlo preferences" on page 140.
8. If the default settings for the cluster analysis are not satisfactory,
specify new preferences:
b. Set the parameters and options on the panel and the Polymorph
Cluster Output subpanel. See "Setting Cluster Analysis
preferences" on page 145.
9. If the default settings for energy minimization are not satisfactory,
specify new preferences:
b. Set the parameters and options on the panel and the Polymorph
Minimize Output subpanel. See "Setting minimization
preferences" on page 147.
11. Perform reliability checks to verify the accuracy of the polymorphs
obtained by the prediction procedure. See "Reliability
checking" on page 156.
To predict polymorphs manually
1. Load and/or sketch molecules that you want to include in the
asymmetric unit of the crystal system you want to study.
| If you use the OFF forcefield on a non-periodic model after selecting the Ewald method, the method will automatically switch to spline. |
5. Open the Polymorph Run control panel by choosing Run from
the POLYMORPH PREDICTOR card.
b. Set the parameters and options on the panel and its two subpanels:
Polymorph MC Parameters and Polymorph MC
Output. See "Setting Monte Carlo preferences" on page 140.
9. If the default settings for cluster analysis are not satisfactory,
specify new preferences:
b. Set the parameters and options on the panel and the Polymorph
Cluster Output subpanel. See "Setting Cluster Analysis
preferences" on page 145.
11. If the default settings for energy minimization are not satisfactory,
specify new preferences:
b. Set the parameters and options on the panel and the Polymorph
Minimize Output subpanel. See "Setting minimization
preferences" on page 147.
14. Perform reliability checks to verify the accuracy of the polymorphs
obtained by the prediction procedure. See "Reliability
checking" on page 156.
Please see the online help for more detailed information on the control panels.
Monte Carlo packing simulation
The first step of a polymorph prediction sequence is a Monte Carlo simulation of the thermodynamic movement of the system for each selected space group. The simulation consists of two phases, heating and cooling, and is thus termed simulated annealing. The Metropolis algorithm is used to determine whether generated trial structures are accepted or rejected.
The simulation
Generating initial
structures
An initial structure is generated for each space group from the models specified on the Polymorph Run control panel. Each of the specified models is copied into the current model space (which should be empty) before starting the simulation. If more than one copy of any model is required, the fragment is copied the appropriate number of times.
Firstly, each trial crystal is heated. During heating, the temperature for each new trial, Tnew, is obtained from that used for the previous trial, Told, using a user-definable heating factor, Th, by:
Eq. 1
Heating continues until either a preset maximum temperature is reached or a specified number of consecutive trial moves accepted.
If a trial is accepted during cooling, the temperature to use for the next trial Tnew, is obtained from that used for the previous trial, Told, using a user-definable cooling factor, Tc, by:
Eq. 2
The default cooling factor specified is modified according to the number of rigid units in the asymmetric unit, N, using the following algorithm:
Eq. 3
The Monte Carlo move factor is updated with the same algorithm used during the heating phase. The run finishes when either the temperature reaches a preset minimum, a specified maximum number of trials have occurred, or the Monte Carlo move factor becomes lower than a minimum allowed value.
Trial steps
Each Monte Carlo trial consists of the following steps:1. Make a random change to the orientation of each rigid unit
within the asymmetric unit.
4. If there is more than one rigid unit in the asymmetric unit:
5. Remove van der Waals contacts from the crystal:
6. Calculate the energy and density of the crystal and apply a standard Metropolis acceptance/rejection test. In addition, a density constraint is applied: if the density of the structure is below 0.3 g/cm3, the step is always rejected.
Setting Monte Carlo preferences
You specify the parameters and options that affect the Monte Carlo packing simulation using options on (or accessible from) the Polymorph MC Preferences control panel (please see the online help for control panel information).
Search parameters
You can individually define search parameters such as the heating and cooling factors (Th and Tc) described in the previous section (see page 138). Alternatively, you can conveniently adjust the default values of all the parameters necessary to provide a coarse, medium, or fine search of phase space during the simulation using the Search Level popup.
The most frequently used options are available on the Polymorph MC Preference panel. Others are located on the directly accessible Polymorph MC Parameters control panel (by pressing Preferences...).
Effects of changing search parameters
This section discusses the effects and consequences of changing three of the most significant search parameters:
The purpose of the heating phase is to find an acceptable starting temperature for simulated annealing (that is, the cooling phase). Ideally, the cooling phase should start from a temperature at which all possible moves are accepted, that is, where phase space can be sampled completely randomly. In order not to waste cooling steps during which only random moves are made, you should make this temperature as low as possible. It is also important that this temperature is determined with the minimum of effort. If cooling is started after a certain number of trials are accepted in a row then there is a good likelihood that the desired temperature has been reached somewhere early in the heating phase.
The rate of cooling is the most important factor during cooling. It is useful to think of a particular cooling rate as defining a level of detail for the search of phase space. A longer search at a particular temperature increases the probability of finding the energy minima at a particular level of detail.
Space groups
You can search up to 20 space groups during the simulation. However, such a search is very time consuming and may not yield significantly greater accuracy than a search of just the few most significant space groups. This is illustrated by the following table, which lists the 17 most common space groups (which appear on the Space Group Selector popup) and the percentage of organic crystals that occur in them.
You should use brief descriptions, with single spaces placed between the descriptor for the lattice type and each of the symmetry directions. You should represent subscripts, as used for screw axes, by normal numbers (so that a screw dyad is represented by 21, a triad by 31 or 32, and so on). Represent inversion axes with minus signs (for example, -3, -4, -6). Some examples are:
Please see the online help for detailed information on the control panels.
The Monte Carlo simulation outputs large numbers of unoptimized structures for each space group. You can expect these to include low-energy structures and clusters of many very similar structures that are obtained at the local minima of the energy surface of the crystal.
minimization output
Initially-distinct low-energy structures produced by cluster analysis of Monte Carlo output can become very similar during energy minimization. You can remove these redundant structures by performing a final cluster analysis procedure on trajectory output from energy minimization procedures.
Procedure
The Polymorph Predictor automated cluster analysis procedure consists of the following steps:1. The source trajectory file is traversed and the unclustered structure
with the lowest energy is identified and written to the output
trajectory file.
1. Groups atoms in the asymmetric unit cell by element, atom name or forcefield type for the low energy reference structure.
5. Repeats steps 1 through 4 for all the clusters.
Setting Cluster Analysis preferences
The parameters and options that affect the cluster analysis procedure are specified using options on (or accessible from) the Polymorph Cluster Prefs control panel (see the online help).
Input file
By default, the trajectory file output by the last run prediction procedure is selected as the input file for cluster analysis. If you do not want to use this file, choose another using the file selector on the panel.
Clustering parameters
You can specify the cutoff distance for interatomic relationships to be examined during clustering such that only sensible interactions are considered.
Clustering tolerances for Monte Carlo and minimized output
The clustering tolerances required for output from the Monte Carlo and energy minimization phases are significantly different. The clustering level required after the minimization phase needs to be much finer than that required after the Monte Carlo search.
The Monte Carlo search outputs a lot of raw structures that differ significantly. Here you should set a tolerance that performs as much data reduction as possible while obtaining as many different starting structures for minimization as possible.
Minimized structures are fully optimized; a small difference between these structures may represent a different polymorph. At this stage, you typically have tens of structures and can, therefore, afford to perform a more detailed search. It is strongly recommended that you use the defaults associated with Fine clustering for all post minimization clustering. You can identify any remaining truly similar structures by visual comparison and/or using other Cerius2 instruments (for example, Diffraction).
You can use the Cluster Measure tool available on the Polymorph Properties control panel (see "Trajectory file analysis and model extraction" on page 149) to investigate sensible clustering values for an output file. To do this extract the first frame into a model and obtain measures against all other frames.
Output options
Output options, such as output file naming, progress plotting, and text window output are set on the Polymorph Cluster Output control panel, which is accessed from the Polymorph Cluster Prefs control panel (by pressing Output...).
panels.
Energy minimization
Cluster analysis outputs low energy cluster representatives that are still energetically unrefined and require optimization with respect to all degrees of freedom.
The Polymorph Predictor energy minimization procedure can also be performed on individual models. The minimizer used in the procedure is the same one used by other Cerius2 modules, and can therefore be used with other structures in other modules.
Setting minimization preferences
You can specify the parameters and options that affect energy minimization using options on (or accessible from) the Polymorph Minimize Prefs control panel (see the online help).
By default, the trajectory file output by the last run prediction procedure is selected as the input file for energy minimization. If you do not want to use this file, choose another using the file selector on the control panel.
Various termination criteria for the minimization in Cerius2 exist: a target root mean squared force or a maximum number of minimization steps.
If rigid bodies are defined on the molecules forming the asymmetric unit, the Minimizer normally recognizes these rigid bodies and does not change the relative position of atoms within a rigid body. It is often desirable to compare results with and without rigid body minimization -- for this purpose, a switch Ignore rigid bodies is provided, allowing you to rerun a minimization on a trajectory file as if no rigid bodies had been defined.
You can set output options, such as output file naming, progress plotting, model and text window output, on the Polymorph Minimize Output control panel (accessed from the Polymorph Minimize Prefs control panel by pressing Output...).
Notes on rigid body minimization
Polymorph Predictor analysis tools allow you to study the properties of the contents of a trajectory file, identify the most interesting structures, then extract them into model spaces. Three steps are involved, the controls for each contained on a separate control panel available from the Analysis pullright on the POLYMORPH card.
Select trajectory file
By default, the trajectory file output by the last-run prediction procedure is selected as the input file for analysis and extraction. If you do not want to analyze this file, choose another using the file selector on the Polymorph Analysis File control panel (see the online help).
You can also search for structures in the file with the lowest or highest values (as appropriate) for these properties.
You can compare an existing model to the currently-selected analysis trajectory file using the clustering algorithm. Structures are sorted by similarity to the reference model, and their frame numbers listed in the text window. A small Cluster Measure Preferences panel allows you to set the cluster parameters used for this comparison. Note that changing the settings specified on this panel does not affect the clustering parameters used during a Polymorph Prediction run. These can only be set on the Polymorph Cluster Prefs panel, which is available from the main Polymorph Run panel (please see "Setting Cluster Analysis preferences" on page 145 for more information).
Having identified a structure for further study, you can extract it into the current model space using the controls on the Polymorph Extract control panel (see the online help). You specify the structure to extract using its frame number; information on the selected structure can be listed in the text window prior to extraction. You can also write frames represented by points selected on a graph to a new trajectory file.
Automatically comparing powder spectra
The Polymorph Predictor identifies a limited number of likely low-energy crystal structures for a given compound. To determine which of these candidate structures is/are experimentally observed, a common technique is to compare simulated powder patterns with experimentally observed powder patterns.
By default, the trajectory file output by the last-run prediction procedure is selected as the input file for the powder comparison. If you do not want to analyze this file, choose another using the file selector on the Polymorph Analysis File control panel.
Next, you need to specify with which experimental you wish to compare the frames in the trajectory. The Load Powder ... button on the Powder Comparison control panel opens the 1-D Experimental Data control panel (also accessible from the DIFFRACTION-CRYSTAL card). A large variety of different file formats is supported, the only restriction being that the abscissae of the experimental spectrum need to be equally spaced.
It is often necessary to subtract the background from the experimental spectrum and/or to artificially broaden it, in order to allow a meaningful comparison between the simulated and experimental spectrum. Broadening and background subtraction are performed using the Prepare Powder button on the Powder Comparison control panel, which opens the 1-D Data Transformations control panel.
To perform a meaningful comparison, you must ensure that the settings used to generate the simulated spectra mimic the experimental conditions as closely as possible. In particular, the display range, the radiation wavelength and instrumental or other peak broadening factors may need to be adjusted. Usually, the chosen display range should be similar to the range on which the experimental data is defined.
The CMACS measure is calculated individually on a number of equally spaced intervals over the display range, and the results are averaged over these intervals. The default for the number of intervals is 10. In cases where the spectrum contains a very large number of peaks, you may wish to increase this value. You can change this value using the CMACS Intervals input field on the Powder Comparison control panel.
Since you may wish to carry out more than one comparison for a given trajectory file, you need to specify a name for each comparison. Simply enter the desired name in the Results Identifier field, or use the default name suggested.
Analyze measures of comparison
Once a powder comparison has been done, you can use the standard tools on the Polymorph Analysis control panel to analyze the results. If you have carried out more than one comparison for the trajectory file, you must choose which results to analyze, by using the pulldown menu in the lower half of the Powder Comparison control panel. All the analysis tools extract information from the calculation specified in this field. Using the Property pulldown on the Polymorph Analysis control panel, you can now choose which of the four measures of comparison you wish to analyze. You can plot the measures as a function of frame number, find the frames with the best match (using the Search for Lowest ... Frames option), and create a new trajectory containing only frames whose powder pattern is close to the experimental pattern, indicated by small values for the comparison measures.
If you do not remember the Diffraction Crystal settings and/or the experimental powder pattern to which a particular comparison corresponds, use the Show settings button on the Powder Comparison control panel. It prints, among other things, a number of DIFF-CRYSTAL commands which, if executed as a script, will reset the DIFFRACTION-CRYSTAL settings to the values used for the powder comparison identified by Comparison results to analyze.
You can delete the comparison identified by Comparison results to analyze using the Delete option on the Powder Comparison control panel.
1. Select a trajectory file using the Polymorph Analysis File control
panel.
2. Load an experimental spectrum using the Load Powder ...
option.
6. Type in an identifier for the calculation in the Results Identifier
field.
7. Perform the comparison by clicking the Compare using ... button.
12. Delete a calculation using the Delete option.
Merging trajectory files
Having clustered, minimized, and re-clustered the output from the Monte Carlo search for a number of space groups, you are left with several trajectory files (one per space group searched) containing structures that are likely to be found experimentally.
Please see the online help for more information on the control panels.
Reliability checking
It is recommended that you confirm the reliability of the search procedures by performing one or more checks on the validity of the final output structures. This section describes a few such tests.
properties
Where experimental data is available, comparison against properties calculated for a predicted structure using Cerius2 computational instruments can be very effective.
If a structure is predicted in a particular symmetry then it should also be predicted in a lower symmetry provided that the lower symmetry has sufficient degrees of freedom (that is, permitted translations, cell angles, cell lengths, and types of symmetry operators) to reproduce the higher symmetry.
| In certain cases, for example, if the molecule is not chiral, P 1 (Z=2) should reproduce P -1 (Z=1). However, if the molecule is chiral then P 1 (Z=2) should not reproduce P -1 (Z=1). |
Change the FF dielectric parameters
Current forcefield technology neglects polarization effects. If the stability ranking of predicted structures is not affected by changing the values of the dielectric constant epsilon (used in electrostatic calculations), then the prediction can be assumed reliable. To find out more about the dielectric parameters and recalculating the energies of your structures, see Cerius2 Simulation Tools.

Theory
Predicting polymorphs
Molecules tend to exhibit polymorphism, that is, they crystallize in many modifications, as a result of the physical conditions and the manner in which the crystals are obtained. Different polymorphs possess different values of the Gibbs free energy (G). However, it does not necessarily follow that a crystallization process yields the modification with the lowest value of G. In fact, the relationship between crystallization conditions and the polymorph obtained is not generally understood.
Theory
As stated earlier, experiments are likely to find crystal structures with a fairly low value of G. Thermodynamics states that G decomposes into
with energy E, pressure p, volume V, temperature T, and entropy S.
Different crystal structures have different entropies at finite temperatures. TS therefore has a significant influence on G. However, because it is difficult to compute the entropy of a crystal, TS is not considered. Neglecting this factor changes the calculated energy slightly from the true value of G at finite temperatures. However, these effects are only expected to reorder the relative stabilities of the different low energy structures predicted at zero kelvin -- the lowest energy structures Polymorph Predictor obtains should represent those found experimentally, but probably with different relative stabilities.
,
, and
). The content of the unit cell is specified by the coordinates of the asymmetric unit and by the space group.
Reliability checks
In global optimization problems there is usually no guarantee that the global minimum has been located. In crystal packing problems, these difficulties are worsened since the complete lower part of the energy spectrum must be calculated. Polymorph Predictor provides many adjustable parameters that allow you to fine tune the packing simulation and increase search reliability. However, such increased reliability comes at the cost of dramatically increased processing demands, so it is not generally practical to perform the most detailed search possible.
Please see the following references for a more detailed discussion of the theory and background of the prediction procedures implemented in Polymorph Predictor (Gdanitz 1992, Karfunkel and Gdanitz 1992, Karfunkel and Leusen 1992, Karfunkel et al 1993, Gdanitz et al 1993, Karfunkel et al 1994).
Simulated annealing theory
Polymorph Predictor's Monte Carlo packing algorithm performs a simulated annealing procedure intended to search for the lowest minima of the energy function E of molecular crystals. Such a search is too difficult for classical algorithms for numerous reasons:
The simulated annealing method works around these difficulties by treating the search for the global minimum of E as a thermodynamic problem. At some non-zero temperature, T, the crystal changes its structure randomly, and its energy fluctuates accordingly. To prevent the simulation from becoming trapped in a local minimum, cooling begins at a relatively high temperature; every crystal structure (within the constraints of the space group and asymmetric unit contents) can then theoretically be reached, and ergodicity therefore ensured.
Automatically comparing powder spectrad
The automatic comparison of powder patterns relies on the availability of a computational measure to indicate "closeness" between powder spectra. Each powder pattern is characterized by a set of intensity values at N observation points k = 1, ..., N, corresponding to a set of diffraction angles. Four different figures of merit have been implemented to enable you to judge how "close" the simulated powder pattern from a given trajectory frame is to the experimental pattern. In each case, a small value indicates a close match and a large value indicates that the powder patterns are very different. The four measures of comparison are:
Eq. 4
Eq. 5
Eq. 6
Eq. 7
The weights wk at the observation points k are defined via wk = 1/Ik , where Ik = Yk(exp) + Yk(background) is the experimentally observed intensity. Ik must be positive everywhere and Yk(background) should be previously calculated and subtracted from the experimental spectrum using the Load Powder option. The first three of these measures are essentially Rietveld-like measures. If no background is available, wk = 1 is assumed for RWP and R-INDEX. Yk(calc) is the calculated intensity, rescaled such that
Eq. 8
is minimized.
Belsky, V.K. and Zorkii, P.M., Acta. Cryst. A33 1004 (1977).
References