Property Prediction



7       Polymorph Predictor

The Polymorph Predictor module simulates the packing process for molecular crystals, generating a number of stable crystal structures that have a high probability of being found experimentally. The polymorphs generated are saved in trajectory files; tools are provided to analyze the properties of structures in these trajectory files, and to extract those required for further study into model spaces.


Introduction

Sections in this chapter

Using the Polymorph Predictor

Setting up

Monte Carlo packing simulation

Cluster analysis

Energy minimization

Trajectory file analysis and model extraction

Merging trajectory files

Reliability checking

Theory

References

Topic Reference
Loading forcefields   The Cerius2 Simulation Tools book  
Preparing conformers for flexible molecules   The Cerius2 Conformational Search and Analysis book  


Using the Polymorph Predictor

Several time-consuming steps are required to generate probable molecular crystals using Polymorph Predictor. However, once you have prepared and specified the molecular models to include in the asymmetric unit, and set up all parameters and options, the program can handle the entire prediction sequence automatically.

Step 1: Setup

Before you start, you must load and configure an appropriate Cerius2 forcefield. You must then prepare molecules to include in the asymmetric unit of the crystal structure being studied, and appropriately configure prediction procedure parameters and options. For details, see "Setting up" on page 129.

Step 2: Monte Carlo packing simulation

The Monte Carlo packing simulation searches each specified space group for the most probable crystal structures. For details, see "Monte Carlo packing simulation" on page 138.

Note

A trajectory file containing all accepted structures is output for each space group searched. You must perform steps 2, 3, 4, and 5 of the prediction sequence for each space group, using the trajectory files output from the previous step as input. See Appendix B, "File Formats" for more information about trajectory files.  

Step 3: Cluster analysis

Trajectory files generated by the Monte Carlo simulation for each space group contain large numbers of unoptimized structures; those with the lowest energy are the most likely to occur experimentally. At the local minima of the energy hyper surface explored during the simulation, clusters of many very similar structures (of which only the lowest energy representative is significant) are expected.

These clusters of similar structures and their lowest energy representatives can be identified by performing cluster analysis on the trajectory files output by the Monte Carlo simulation. Low energy representatives identified by clustering are output in trajectory file format. For details, see "Cluster analysis" on page 144.

Step 4: Energy minimization

The low-energy structures output by cluster analysis are still energetically unrefined and require optimization with respect to all degrees of freedom.

Energy minimization subjects each structure in a trajectory file output by cluster analysis to a full body minimization. The resulting optimized structures are saved in an output trajectory file. For details, see "Energy minimization" on page 147.

Step 5: Cluster minimized structures

Initially distinct structures may become very similar, or even identical, during minimization. You should remove such redundant structures from the trajectory files produced by energy minimization by running a final cluster analysis. The remaining structures are saved in an output trajectory file.

Trajectory file merging

Having performed steps 2 through 5 for each of a number of space groups, you are left with several output files (one per space group) containing optimized structures ready for final analysis and model extraction. However, it is possible that structures in the output from different space groups are identical. You may want to merge the output trajectory files and perform a final cluster analysis to remove such redundant structures. See "Merging trajectory files" on page 155. For more detailed information on trajectory files, please see Appendix B, "File Formats".

Trajectory file analysis

After each polymorph run, the resulting trajectory file can be analyzed using the Analysis item on the POLYMORPH card. Please see "Trajectory file analysis and model extraction" on page 149 for more information.

Reliability checks

It is recommended that you confirm the accuracy of the polymorphs obtained by the prediction procedures by performing one or more reliability checks on them. See "Reliability checking" on page 156.


General Methodology

Setting up

Before you start a polymorph prediction run, you must appropriately configure a Cerius2 forcefield and specify and prepare models of all molecules in the asymmetric unit of the crystal. Once you have defined the asymmetric unit, you can initiate further steps in the prediction sequence (Monte Carlo packing simulation, cluster analysis, minimization, and final clustering) as a complete, automated sequence or as individual steps. Either way, you should set parameters and options for each step prior to initiating it.

Configuring forcefields

You should use a forcefield that is appropriate for the atom types (that is, the elements and their hybridization, as well as the local geometry) in the structure. The latest version of the Dreiding forcefield, Dreiding2.21, is the most appropriate for molecular crystals without metal atoms and is recommended for Polymorph Predictor runs.

You must specify use of the Ewald long-range summation method for electrostatic and van der Waals interactions, as other methods do not handle interactions between molecules correctly. To find out more about Cerius2 forcefields, how to load them, and set parameters for energy terms, see the chapter on the Open Force Field in Cerius2 Simulation Tools.

Preparing models

You should prepare and specify models that will be included in the asymmetric unit of the crystal structure being studied (including the number of times the model should appear in the unit) using the Polymorph Run control panel.

Minimize and calculate charges

Each model you specify for inclusion in the asymmetric unit should be energetically minimized and have charges calculated and assigned to its atoms before polymorph prediction begins.

Use the Cerius2·Minimizer to perform energy minimization on each molecule (see the Cerius2 Simulation Tools). If you have experimental data for comparison (for example, a powder pattern), MOPAC ESP charges for atoms in the models obtained using the Cerius2·MOPAC module should be sufficiently accurate. If you require improved quality predicted structures, particularly for ab initio predictions where no experimental data is available, use of the Gaussian or another ab initio quantum mechanics package is recommended:

1.   Perform a Gaussian optimization of the molecules in the asymmetric unit, preferably with the 6-31G** basis set.

2.   Obtain Gaussian ESP charges for all atoms.

3.   Edit Open Force Field parameters so that the forcefield minimized structure is as close as possible to the Gaussian optimized structure. When performing this editing, it is recommended that you start with the simple parameters and work upward in complexity (for example, edit bond potentials before valence potentials) as the amount of work required to refine increases exponentially.

4.   Specify rigid body constraints which you wish to apply during the final optimization of the structures. By default, Polymorph Predictor treats molecules as rigid during the initial Monte Carlo/simulated annealing step, but, during the final minimization, the flexibility of the molecules is taken into account and all atomic positions are optimized individually. If you wish certain structural units or whole molecules to remain rigid during this final minimization, you must specify these rigid body constraints before starting the simulation sequence. See the Cerius2 Forcefield Engines manual on how to define rigid body constraints.

Note

With no experimental data available, it is impossible to check the results of ab initio predictions. You may find it useful to pack similar molecules for which experimental data is available. You can then check whether the motifs of the packing (for example, hydrogen bonding patterns) are found using Polymorph Predictor.  

Flexible molecules

The Monte Carlo packing simulation treats all molecules as rigid structures. For flexible molecules, you should prepare conformers corresponding to local energy minima of the conformational space using Conformer Search and Conformer Analysis. After preparing the conformers, perform the polymorph prediction procedure on each one. (See the Cerius2 Conformer Search and Analysis book.)

Where a flexible molecule occurs more than once in the asymmetric unit, you must perform prediction runs for each possible combination of stable conformers. For example, for a flexible molecule with three stable conformers, A, B, and C, where you suspect that the molecule occurs twice in the asymmetric unit, six prediction sequences are required -- for combinations A+A, A+B, A+C, B+B, B+C, and C+C.

Simulation parameters

The Monte Carlo packing simulation, cluster analysis, and energy minimization processes each have a number of associated options and parameters that affect their operation. Unless you want to use default values, you should access the preferences panels that contain the controls for these parameters, and set them appropriately prior to initiating any of these procedures.

Predicting Polymorphs

Polymorph Predictor limitations

Flexible molecules

Polymorph Predictor treats all molecules as rigid structures. For flexible molecules, conformers corresponding to local energy minima of the conformational space must be prepared and the polymorph prediction procedure performed on each one.

Where a flexible molecule occurs more than once in the asymmetric unit, prediction runs must be performed for each possible combination of stable conformers. This means that the number of runs required quickly becomes unmanageable as the number of stable conformers and/or molecules in the asymmetric unit increases.

For practical reasons then, molecules included in the asymmetric unit for prediction runs should have limited flexibility. Sensible limits are five or fewer conformers, within approximately two kcal/mol of the global minimum.

Forcefields

Polymorph Predictor calculates likely crystal structures with forcefield technology. Many different forcefields are available, each optimized for different elements and structure types. The accuracy of a prediction run is therefore related to the suitability of the forcefield for the crystal being studied.

You should use a forcefield that works well for the atom types (that is, the elements and their hybridization) in the structure. The latest version of the Dreiding forcefield, Dreiding2.21, is optimized for molecular crystals; however, it does not work well for crystal structures containing metals. Dreiding models carbon, nitrogen, oxygen, and hydrogen most accurately; it models sulphur, phosphorous, chlorine, and bromine less accurately, and so on down the periodic table.

Processing time

Prediction time increases in proportion to the number of:

Doubling any of the listed values roughly doubles processing time for the Monte Carlo step and quadruples processing time for the minimization. Recommended limits to the number of atoms in he asymmetric unit are 150 as an upper bound and 100 as a reasonable maximum.

Intramolecular symmetry

Polymorph Predictor assumes no intramolecular symmetry -- there should be no bonds between the asymmetric unit and any symmetry copies of the asymmetric unit.

Running a prediction

You can initiate a complete polymorph prediction sequence or you can launch each step manually from the Polymorph Run control panel. The following steps are performed for each space group to be searched during the Monte Carlo simulation (please see the online help for more information on the control panels):

You perform each step using the parameters and options on corresponding preferences panels. Open the appropriate panels to change any of their values from their defaults.

Restarting interrupted prediction procedures

Each of the polymorph prediction procedures (Monte Carlo simulation, cluster analysis, and energy minimization) can take a long time (up to several hours) to run. To avoid the potential time wastage and inconvenience that a system crash or unavoidable process interruption could cause, Polymorph Predictor provides a mechanism for restarting such interrupted jobs.

Note

Only the operation in process at the time of the interruption can be restarted. Automatic prediction sequences cannot be resumed; you must manually complete any operations required after you restart a procedure.  

In addition to the trajectory files that are generated during each procedure, restart and summary files (with the same names as the trajectory file, but with extensions of .rst and .summary rather than .pmp) are periodically written. If the run completes successfully the restart file is deleted, but not the summary file.

Restart files contain information necessary to complete the interrupted run from the point at which the file was last updated, including the settings of all relevant Polymorph Predictor options and parameters for the operation. All such options are restored and used if a run is restarted. Note that forcefield and other settings are not included in restart files and must be restored by alternative means. The frequency of restart file generation is necessarily different for each procedure:

Defining the molecules in the crystal structure

The molecules to include in the asymmetric unit of the crystal structure you are studying are defined (by model number) in the lower portion of the Polymorph Run control panel. You can also specify that multiple copies of a model be incorporated into the asymmetric unit. The content of each model is treated as a rigid unit during the Monte Carlo simulation.

To run a complete polymorph prediction sequence

1.   Load and/or sketch molecules that you want to include in the asymmetric unit of the crystal system you want to study.

2.   Load an appropriate forcefield (Dreiding2.21 is recommended) and select the Ewald long-range summation method for electrostatic interactions using the Open Force Field.

Note

If you use the OFF forcefield on a non-periodic model after selecting the Ewald method, the method will automatically switch to spline.  

3.   Minimize, obtain MOPAC or Gaussian ESP charges, and, if flexible, obtain low energy conformers for all molecules in the asymmetric unit.

4.   If you wish structural units or whole molecules to remain rigid during the minimization step of the Polymorph prediction sequence, specify these rigid body constraints for each molecule in the asymmetric unit using the options provided on the OPEN FORCE FIELD/Constraints or MINIMIZER/Constraints control panels. See the Forcefield Based Simulations manual for details on how to define rigid body constraints.

5.   Open the Polymorph Run control panel by choosing Run from the POLYMORPH PREDICTOR card.

6.   Specify the model or models containing the molecules you want to include in the asymmetric unit of the crystal. Specify the number of copies of any model that should be included more than once in the asymmetric unit.

7.   If the default settings for the Monte Carlo simulation are not satisfactory, specify new preferences:

a.   Open the Polymorph MC Preference control panel by pressing the Preferences... button beside the Run Monte Carlo button.

b.   Set the parameters and options on the panel and its two subpanels: Polymorph MC Parameters and Polymorph MC Output. See "Setting Monte Carlo preferences" on page 140.

8.   If the default settings for the cluster analysis are not satisfactory, specify new preferences:

a.   Open the Polymorph Cluster Prefs control panel by pressing the Preferences... button beside the Run Cluster Analysis button on the Polymorph Run control panel.

b.   Set the parameters and options on the panel and the Polymorph Cluster Output subpanel. See "Setting Cluster Analysis preferences" on page 145.

9.   If the default settings for energy minimization are not satisfactory, specify new preferences:

a.   Open the Polymorph Minimize Prefs control panel by pressing the Preferences... button beside the Run Energy Minimization button on the Polymorph Run control panel.

b.   Set the parameters and options on the panel and the Polymorph Minimize Output subpanel. See "Setting minimization preferences" on page 147.

10.   Initiate the polymorph prediction sequence by pressing the Predict Polymorphs button on the Polymorph Run control panel. Polymorph Predictor performs all necessary procedures automatically, including Monte Carlo simulation and subsequent cluster analysis, minimization, and final clustering for each specified space group.

11.   Perform reliability checks to verify the accuracy of the polymorphs obtained by the prediction procedure. See "Reliability checking" on page 156.

To predict polymorphs manually

1.   Load and/or sketch molecules that you want to include in the asymmetric unit of the crystal system you want to study.

2.   Load an appropriate forcefield (Dreiding2.21 is recommended) and select the Ewald long-range summation method for electrostatic interactions using the Open Force Field.

Note

If you use the OFF forcefield on a non-periodic model after selecting the Ewald method, the method will automatically switch to spline.  

3.   Minimize, obtain MOPAC or Gaussian ESP charges, and, if flexible, obtain low energy conformers for all molecules in the asymmetric unit.

4.   If you wish structural units or whole molecules to remain rigid during the minimization step of the Polymorph prediction sequence, specify these rigid body constraints for each molecule in the asymmetric unit using the options provided on the OPEN FORCE FIELD/Constraints or MINIMIZER/Constraints control panels. See the Forcefield Based Simulations manual for details on how to define rigid body constraints.

5.   Open the Polymorph Run control panel by choosing Run from the POLYMORPH PREDICTOR card.

6.   Specify the model or models containing the molecules you want to include in the asymmetric unit of the crystal. Specify the number of copies of any model that should be included more than once in the asymmetric unit.

7.   If the default settings for the Monte Carlo simulation are not satisfactory, specify new preferences:

a.   Open the Polymorph MC Preference control panel by pressing the Preferences... button beside the Run Monte Carlo button.

b.   Set the parameters and options on the panel and its two subpanels: Polymorph MC Parameters and Polymorph MC Output. See "Setting Monte Carlo preferences" on page 140.

8.   Run the Monte Carlo simulation by pressing the Run Monte Carlo button on the Polymorph Run or Polymorph MC Preferences control panel. A trajectory file is output for each space group searched.

9.   If the default settings for cluster analysis are not satisfactory, specify new preferences:

a.   Open the Polymorph Cluster Prefs control panel by pressing the Preferences... button beside the Run Cluster Analysis button on the Polymorph Run control panel.

b.   Set the parameters and options on the panel and the Polymorph Cluster Output subpanel. See "Setting Cluster Analysis preferences" on page 145.

10.   Perform cluster analysis on the contents of each trajectory file (one per space group searched) output by the Monte Carlo simulation by pressing the Run Cluster Analysis button on the Polymorph Run or Polymorph Cluster Prefs control panel.

11.   If the default settings for energy minimization are not satisfactory, specify new preferences:

a.   Open the Polymorph Minimize Prefs control panel by pressing the Preferences... button beside the Energy Minimize button on the Polymorph Run control panel.

b.   Set the parameters and options on the panel and the Polymorph Minimize Output subpanel. See "Setting minimization preferences" on page 147.

12.   Perform energy minimization on the contents of each trajectory file (one per space group searched) output by cluster analysis by pressing the Energy Minimize button on the Polymorph Run control panel or the Minimize button on the Polymorph Minimize Prefs control panel.

13.   Specify preferences for and perform a final cluster analysis on each trajectory file (one per space group searched) output by energy minimization. See steps 8 and 9 above.

14.   Perform reliability checks to verify the accuracy of the polymorphs obtained by the prediction procedure. See "Reliability checking" on page 156.

Please see the online help for more detailed information on the control panels.

Monte Carlo packing simulation

The first step of a polymorph prediction sequence is a Monte Carlo simulation of the thermodynamic movement of the system for each selected space group. The simulation consists of two phases, heating and cooling, and is thus termed simulated annealing. The Metropolis algorithm is used to determine whether generated trial structures are accepted or rejected.

The simulation

Generating initial
structures

An initial structure is generated for each space group from the models specified on the Polymorph Run control panel. Each of the specified models is copied into the current model space (which should be empty) before starting the simulation. If more than one copy of any model is required, the fragment is copied the appropriate number of times.

Heating phase

Firstly, each trial crystal is heated. During heating, the temperature for each new trial, Tnew, is obtained from that used for the previous trial, Told, using a user-definable heating factor, Th, by:

Eq. 1            

Heating continues until either a preset maximum temperature is reached or a specified number of consecutive trial moves accepted.

If the current trial is rejected, the Monte Carlo move factor is halved. (Note that the move factor has an initial value of 1.0 during the cooling phase, at which all parts of phase space are accessible within one move. During the heating phase, the value is fixed at 1.0). If the trial is accepted this factor is doubled, with a maximum allowed value of 1.0.

Cooling phase

If a trial is accepted during cooling, the temperature to use for the next trial Tnew, is obtained from that used for the previous trial, Told, using a user-definable cooling factor, Tc, by:

Eq. 2            

The default cooling factor specified is modified according to the number of rigid units in the asymmetric unit, N, using the following algorithm:

Eq. 3            

The Monte Carlo move factor is updated with the same algorithm used during the heating phase. The run finishes when either the temperature reaches a preset minimum, a specified maximum number of trials have occurred, or the Monte Carlo move factor becomes lower than a minimum allowed value.

Trial steps

Each Monte Carlo trial consists of the following steps:

1.   Make a random change to the orientation of each rigid unit within the asymmetric unit.

2.   If more than one rigid unit is present in the asymmetric unit, make a random change to the relative orientation of each rigid unit within the asymmetric unit.

3.   Depending on space group restrictions, make a random change to the relative orientation of the lattice vectors of the unit cell.

4.   If there is more than one rigid unit in the asymmetric unit:

a.   Expand the asymmetric unit in free space until no van der Waals contacts can be found within the asymmetric unit. (Relatively large step sizes are used during expansion.)

b.   Contract the asymmetric unit in free space while maintaining zero van der Waals contacts within the asymmetric unit such that the molecules in the asymmetric unit are brought into close contact. (Smaller step sizes are used to pack the crystal as tightly as possible.)

Steps a and b both use a random selection procedure to move the relevant translational degrees of freedom.

5.   Remove van der Waals contacts from the crystal:

a.   Increase the magnitude of the lattice vectors until there are no van der Waals contacts. (Relatively large step sizes are used.)

b.   Decrease the magnitude of the lattice vectors while maintaining zero van der Waals contacts. (Smaller step sizes are used to pack the crystal as tightly as possible.)

Steps a and b both use a random selection procedure to move the relevant translational degrees of freedom. When appropriate, the translation of the asymmetric unit within the unit cell also moves randomly during step B.

6.   Calculate the energy and density of the crystal and apply a standard Metropolis acceptance/rejection test. In addition, a density constraint is applied: if the density of the structure is below 0.3 g/cm3, the step is always rejected.

Setting Monte Carlo preferences

You specify the parameters and options that affect the Monte Carlo packing simulation using options on (or accessible from) the Polymorph MC Preferences control panel (please see the online help for control panel information).

Search parameters
You can individually define search parameters such as the heating and cooling factors (Th and Tc) described in the previous section (see page 138). Alternatively, you can conveniently adjust the default values of all the parameters necessary to provide a coarse, medium, or fine search of phase space during the simulation using the Search Level popup.

The most frequently used options are available on the Polymorph MC Preference panel. Others are located on the directly accessible Polymorph MC Parameters control panel (by pressing Preferences...).

Effects of changing search parameters
This section discusses the effects and consequences of changing three of the most significant search parameters:

Heating phase

The purpose of the heating phase is to find an acceptable starting temperature for simulated annealing (that is, the cooling phase). Ideally, the cooling phase should start from a temperature at which all possible moves are accepted, that is, where phase space can be sampled completely randomly. In order not to waste cooling steps during which only random moves are made, you should make this temperature as low as possible. It is also important that this temperature is determined with the minimum of effort. If cooling is started after a certain number of trials are accepted in a row then there is a good likelihood that the desired temperature has been reached somewhere early in the heating phase.

Scaling the temperature for each trial step may result in overshooting the desired value. To attempt to avoid this you can reduce the number of trials to accept before cooling. However, this increases the risk of actually starting at too low a temperature; there is a finite probability of accepting any number of steps at any temperature and this probability increases as the required number of accepted steps is decreased. The default values are a reasonable compromise.

Cooling phase

The rate of cooling is the most important factor during cooling. It is useful to think of a particular cooling rate as defining a level of detail for the search of phase space. A longer search at a particular temperature increases the probability of finding the energy minima at a particular level of detail.

It is thus apparent that the cooling factor is the fundamental parameter in determining the quality of the configurations generated during the Monte Carlo search. Decreasing the cooling factor improves the detail and increases the length of the simulation. The default cooling factor is inversely proportional to the number of molecules in the asymmetric unit. You may want to also reduce it for larger molecules. The default settings provide a good compromise between accuracy and length of search.

Space groups
You can search up to 20 space groups during the simulation. However, such a search is very time consuming and may not yield significantly greater accuracy than a search of just the few most significant space groups. This is illustrated by the following table, which lists the 17 most common space groups (which appear on the Space Group Selector popup) and the percentage of organic crystals that occur in them.

Space group Crystal system Number of
symbolic
operators
Degrees of
freedom
Occurrence (%)
P 21/c   Monoclinic   4   10   35.9  
P -1   Triclinic   2   4   12   17   13.7  
P 21 21 21   Orthorhombic   4   9   11.6  
P 21   Monoclinic   2   9   6.7  
C 2/c   Monoclinic   8   10   6.6  
P b c a   Orthorhombic   8   9   4.3  
P n m a   Orthorhombic   8   9   1.9  
P n a 21   Orthorhombic   4   8   1.8  
P b c n   Orthorhombic   8   9   1.2  
P 1   Triclinic   1   2   4   9   15   27   1.0  
C c   Monoclinic   4   8   0.9  
C 2   Monoclinic   4   9   0.9  
P c a 21   Orthorhombic   4   9   0.8  
P 21/m   Monoclinic   4   10   0.8  
C 2/m   Monoclinic   8   10   0.6  
P 21 21 2   Orthorhombic   4   9   0.6  
P 2/c   Monoclinic   4   10   0.5  

You should define space groups using the standard nomenclature in International Tables of Crystallography Volume A (1989) or International Tables numbers.

You should use brief descriptions, with single spaces placed between the descriptor for the lattice type and each of the symmetry directions. You should represent subscripts, as used for screw axes, by normal numbers (so that a screw dyad is represented by 21, a triad by 31 or 32, and so on). Represent inversion axes with minus signs (for example, -3, -4, -6). Some examples are:

P 21/c
P 42/m
P -4 21 c

You can define only the first unit cell option for each space group. Alternatively, you can supply International Tables numbers.

Output options

Output options, such as trajectory file naming, progress plotting, model viewing, and text window output are set on the Polymorph MC Output control panel, which is accessed from the Polymorph MC Preference panel (by pressing Output...).

Please see the online help for detailed information on the control panels.

Cluster analysis

Clustering Monte Carlo output

The Monte Carlo simulation outputs large numbers of unoptimized structures for each space group. You can expect these to include low-energy structures and clusters of many very similar structures that are obtained at the local minima of the energy surface of the crystal.

Of these structures, those with the lowest energy are the most likely to occur experimentally and are, therefore, of interest; you can disregard the rest.

To obtain the low-energy representatives of each cluster, you perform a cluster analysis procedure on trajectory output from Monte Carlo simulations.

Clustering energy
minimization output

Initially-distinct low-energy structures produced by cluster analysis of Monte Carlo output can become very similar during energy minimization. You can remove these redundant structures by performing a final cluster analysis procedure on trajectory output from energy minimization procedures.

Procedure

The Polymorph Predictor automated cluster analysis procedure consists of the following steps:

1.   The source trajectory file is traversed and the unclustered structure with the lowest energy is identified and written to the output trajectory file.

2.   All other unclustered structures are compared to the low-energy cluster representative (obtained in step 1) by means of a clustering algorithm. The algorithm makes its determination by examining the partial radial distribution functions between pairs of forcefield atom types of the two structures being compared.

3.   Structures deemed to be similar to the low-energy reference structure are marked as belonging to the current cluster.

4.   Steps 1, 2, and 3 are repeated until there are no further structures in the source trajectory file, or a user-specified number of clusters has been identified.

When the procedure terminates, the output trajectory file contains only the low-energy representative of each cluster identified.

Clustering algorithm

1.   Groups atoms in the asymmetric unit cell by element, atom name or forcefield type for the low energy reference structure.

2.   For each pair of forcefield types, calculates all squared interatomic distances between atoms of those types less than a specified cutoff distance.

3.   Calculates partial radial distribution function (.rdf file), identifies the peaks, and compares the peaks for two structures.

4.   If radial distribution function between two structures is identical to the low energy reference structure, removes duplicate structure.

5.   Repeats steps 1 through 4 for all the clusters.

Setting Cluster Analysis preferences

The parameters and options that affect the cluster analysis procedure are specified using options on (or accessible from) the Polymorph Cluster Prefs control panel (see the online help).

Input file

By default, the trajectory file output by the last run prediction procedure is selected as the input file for cluster analysis. If you do not want to use this file, choose another using the file selector on the panel.

Clustering parameters

You can specify the cutoff distance for interatomic relationships to be examined during clustering such that only sensible interactions are considered.

You can also specify the tolerance of the clustering algorithm to differences between structures. The lower the tolerance, the more similar a structure must be to the reference structure to be deemed a member of the cluster. Low tolerances generally result in the identification of a greater number of smaller clusters. You can limit the number of clusters output to a sensible value; if the specified number of clusters is obtained before all structures in the input trajectory file are clustered, the procedure is terminated. However, it is sensible to use a combination of parameters that place the large majority of structures in one of the identified clusters.

Sensible tolerance and maximum cluster values to provide a coarse, medium, or fine detailed cluster analysis can be conveniently obtained using the Search Level popup.

Clustering tolerances for Monte Carlo and minimized output

The clustering tolerances required for output from the Monte Carlo and energy minimization phases are significantly different. The clustering level required after the minimization phase needs to be much finer than that required after the Monte Carlo search.

Monte Carlo output

The Monte Carlo search outputs a lot of raw structures that differ significantly. Here you should set a tolerance that performs as much data reduction as possible while obtaining as many different starting structures for minimization as possible.

Minimized output

Minimized structures are fully optimized; a small difference between these structures may represent a different polymorph. At this stage, you typically have tens of structures and can, therefore, afford to perform a more detailed search. It is strongly recommended that you use the defaults associated with Fine clustering for all post minimization clustering. You can identify any remaining truly similar structures by visual comparison and/or using other Cerius2 instruments (for example, Diffraction).

Identifying sensible clustering values

You can use the Cluster Measure tool available on the Polymorph Properties control panel (see "Trajectory file analysis and model extraction" on page 149) to investigate sensible clustering values for an output file. To do this extract the first frame into a model and obtain measures against all other frames.

Output options

Output options, such as output file naming, progress plotting, and text window output are set on the Polymorph Cluster Output control panel, which is accessed from the Polymorph Cluster Prefs control panel (by pressing Output...).

Please see the online help for more information on the control
panels.

Energy minimization

Cluster analysis outputs low energy cluster representatives that are still energetically unrefined and require optimization with respect to all degrees of freedom.

The energy minimization procedure does a full minimization of each structure output into a trajectory file by cluster analysis. The minimization takes into account any rigid body constraints defined on the molecules forming the asymmetric unit.

Optimizing the current model

The Polymorph Predictor energy minimization procedure can also be performed on individual models. The minimizer used in the procedure is the same one used by other Cerius2 modules, and can therefore be used with other structures in other modules.

Setting minimization preferences

You can specify the parameters and options that affect energy minimization using options on (or accessible from) the Polymorph Minimize Prefs control panel (see the online help).

Input file

By default, the trajectory file output by the last run prediction procedure is selected as the input file for energy minimization. If you do not want to use this file, choose another using the file selector on the control panel.

Termination criteria

Various termination criteria for the minimization in Cerius2 exist: a target root mean squared force or a maximum number of minimization steps.

Rigid Bodies

If rigid bodies are defined on the molecules forming the asymmetric unit, the Minimizer normally recognizes these rigid bodies and does not change the relative position of atoms within a rigid body. It is often desirable to compare results with and without rigid body minimization -- for this purpose, a switch Ignore rigid bodies is provided, allowing you to rerun a minimization on a trajectory file as if no rigid bodies had been defined.

Output options

You can set output options, such as output file naming, progress plotting, model and text window output, on the Polymorph Minimize Output control panel (accessed from the Polymorph Minimize Prefs control panel by pressing Output...).

Please see the online help for more information about control panels.

Notes on rigid body minimization

Trajectory file analysis and model extraction

Polymorph Predictor generates and outputs each structure from the Monte Carlo, cluster analysis, and energy minimization procedures as a frame in a trajectory file. If you want to study a structure in this form using the full range of Cerius2 visualization tools and computational instruments, you must extract it from the trajectory file into a model space.

Polymorph Predictor analysis tools allow you to study the properties of the contents of a trajectory file, identify the most interesting structures, then extract them into model spaces. Three steps are involved, the controls for each contained on a separate control panel available from the Analysis pullright on the POLYMORPH card.

Select trajectory file

By default, the trajectory file output by the last-run prediction procedure is selected as the input file for analysis and extraction. If you do not want to analyze this file, choose another using the file selector on the Polymorph Analysis File control panel (see the online help).

Properties

Tools on the Polymorph Properties control panel (see the online help) allow you to plot profiles and distribution histograms of structures in the trajectory file (by frame number) against a number of their properties:

If you have carried out an automatic comparison of powder spectra (see below) for the selected trajectory, the measures of comparison (R_P, R_WP, R-INDEX and CMACS) are listed as additional properties.

You can also search for structures in the file with the lowest or highest values (as appropriate) for these properties.

You can compare an existing model to the currently-selected analysis trajectory file using the clustering algorithm. Structures are sorted by similarity to the reference model, and their frame numbers listed in the text window. A small Cluster Measure Preferences panel allows you to set the cluster parameters used for this comparison. Note that changing the settings specified on this panel does not affect the clustering parameters used during a Polymorph Prediction run. These can only be set on the Polymorph Cluster Prefs panel, which is available from the main Polymorph Run panel (please see "Setting Cluster Analysis preferences" on page 145 for more information).

Model extraction

Having identified a structure for further study, you can extract it into the current model space using the controls on the Polymorph Extract control panel (see the online help). You specify the structure to extract using its frame number; information on the selected structure can be listed in the text window prior to extraction. You can also write frames represented by points selected on a graph to a new trajectory file.

Please see the online help for more information on the control panels.

Automatically comparing powder spectra

The Polymorph Predictor identifies a limited number of likely low-energy crystal structures for a given compound. To determine which of these candidate structures is/are experimentally observed, a common technique is to compare simulated powder patterns with experimentally observed powder patterns.

Cerius2 incorporates functionality automating this time-consuming step. With the help of the Diffraction-Crystal module, the program automatically generates simulated powder patterns for each frame of a polymorph trajectory and computes figures of merit indicating the "closeness" between simulated and experimental pattern for each frame. These figures of merit can then be analyzed with the usual tools provided on the Polymorph Analysis/Properties control panel.

Automatically comparing powder patterns is done using the Powder Comparison control panel accessible from the Polymorph Analysis pullright. For a given trajectory file, you can carry out a number of different comparisons - for example, you may wish to compare the frames in the trajectory to more than one experimental pattern, or you may wish to change the powder diffraction settings used to generate the simulated spectra which are then compared to the experimental spectrum. You can store each of these comparisons with the trajectory file, giving them Results Identifiers.

To set up an automatic powder comparison analysis, you should be familiar with the Diffraction-Crystal module (see the Cerius2 3.8 Analytical Instruments manual), especially with the generation of simulated powder patterns.

Select trajectory file

By default, the trajectory file output by the last-run prediction procedure is selected as the input file for the powder comparison. If you do not want to analyze this file, choose another using the file selector on the Polymorph Analysis File control panel.

Select experimental spectrum

Next, you need to specify with which experimental you wish to compare the frames in the trajectory. The Load Powder ... button on the Powder Comparison control panel opens the 1-D Experimental Data control panel (also accessible from the DIFFRACTION-CRYSTAL card). A large variety of different file formats is supported, the only restriction being that the abscissae of the experimental spectrum need to be equally spaced.

Transform experimental spectrum

It is often necessary to subtract the background from the experimental spectrum and/or to artificially broaden it, in order to allow a meaningful comparison between the simulated and experimental spectrum. Broadening and background subtraction are performed using the Prepare Powder button on the Powder Comparison control panel, which opens the 1-D Data Transformations control panel.

You can fit and subtract a background polynomial using the Subtract polynomial button. The order of the background polynomial can be specified using the input field below - the default is 5. A higher order usually guarantees a more detailed fit. In most cases, the algorithm employed to subtract the background works satisfactorily -- in a few cases, you may need to try a number of iterations with varying order of the background polynomial before achieving a satisfactory result.

For the RP, RWP, and R-Index measures, it is often advantageous to broaden the experimental spectrum before comparison (see the theory below). The options on the 1-D Data Transformations panel allow you to uniformly broaden the experimental spectrum with a Gaussian or Lorentzian. You can also specify the width of the broadening function.

Specify Diffraction-Crystal settings

To perform a meaningful comparison, you must ensure that the settings used to generate the simulated spectra mimic the experimental conditions as closely as possible. In particular, the display range, the radiation wavelength and instrumental or other peak broadening factors may need to be adjusted. Usually, the chosen display range should be similar to the range on which the experimental data is defined.

You adjust all these parameters using the Calculate Crystal Diffraction control panel accessible via the Diffraction Prefs ... button, and the Preferences control panels accessible from there (see the Cerius2 3.8 Analytical Instruments manual). If you wish to test your settings before carrying out the comparison for the full trajectory, you can first extract a frame from the trajectory into a model and then use CALCULATE diffraction pattern on the Calculate Crystal Diffraction control panel to carry out a simulation (see the Cerius2 3.8 Analytical Instruments manual).

Set CMACS intervals

The CMACS measure is calculated individually on a number of equally spaced intervals over the display range, and the results are averaged over these intervals. The default for the number of intervals is 10. In cases where the spectrum contains a very large number of peaks, you may wish to increase this value. You can change this value using the CMACS Intervals input field on the Powder Comparison control panel.

Specify an identifier for the comparison

Since you may wish to carry out more than one comparison for a given trajectory file, you need to specify a name for each comparison. Simply enter the desired name in the Results Identifier field, or use the default name suggested.

Calculate measures of comparison

Use the Compare radio button to compare the current trajectory with the current experimental powder pattern. All four measures of comparison are evaluated (RP, RWP, R-Index, and CMACS). The calculation may take a few minutes, depending on the size of the trajectory. You can monitor the progress on the status bar which appears after you start the calculation.

Note

The calculated measures of comparison are not written to the trajectory file itself, but to a separate file with the same base name as the trajectory file. It can be recognized by the suffix .rfc. If you copy/move trajectory files using UNIX commands and you want to keep the results of your powder comparison, be sure to also copy/move the .rfc files to the same directory as the .pmp files - otherwise, Polymorph will not know about their existence.  

Analyze measures of comparison

Once a powder comparison has been done, you can use the standard tools on the Polymorph Analysis control panel to analyze the results. If you have carried out more than one comparison for the trajectory file, you must choose which results to analyze, by using the pulldown menu in the lower half of the Powder Comparison control panel. All the analysis tools extract information from the calculation specified in this field. Using the Property pulldown on the Polymorph Analysis control panel, you can now choose which of the four measures of comparison you wish to analyze. You can plot the measures as a function of frame number, find the frames with the best match (using the Search for Lowest ... Frames option), and create a new trajectory containing only frames whose powder pattern is close to the experimental pattern, indicated by small values for the comparison measures.

Show settings

If you do not remember the Diffraction Crystal settings and/or the experimental powder pattern to which a particular comparison corresponds, use the Show settings button on the Powder Comparison control panel. It prints, among other things, a number of DIFF-CRYSTAL commands which, if executed as a script, will reset the DIFFRACTION-CRYSTAL settings to the values used for the powder comparison identified by Comparison results to analyze.

Delete Set

You can delete the comparison identified by Comparison results to analyze using the Delete option on the Powder Comparison control panel.

To automatically compare powder spectra

1.   Select a trajectory file using the Polymorph Analysis File control panel.

2.   Load an experimental spectrum using the Load Powder ... option.

3.   If desired, subtract a background and/or broaden the experimental spectrum using the Prepare Powder ... options.

4.   Set up appropriate parameters for simulating powder spectra using the Diffraction Prefs ... options.

5.   If required, modify the number of intervals for the CMACS measure on the Powder Comparison control panel.

6.   Type in an identifier for the calculation in the Results Identifier field.

7.   Perform the comparison by clicking the Compare using ... button.

8.   To extract information about a particular calculation, toggle the Comparison results to analyze field to the results identifier for that calculation.

9.   Select the comparison measure you wish to print/visualize/ analyze using the Property pulldown on the Polymorph Analysis card.

10.   Print/plot/analyze the comparison measures using the standard tools on the Polymorph Analysis control panel.

11.   Print information about the calculation identified by Comparison results to analyze using the Show settings option.

12.   Delete a calculation using the Delete option.

Note

It is important to point out the limitations of this feature. In many trials, we have found that if one of the frames in the trajectory is reasonably close to an experimental spectrum, the program is usually able to identify this frame as one which has the lowest or at least one of the lowest CMACS measures. The reverse is not true -- a small CMACS measure does not guarantee that the frame really corresponds to the experimental spectrum. Additional analysis is required to establish this -- for example, you can extract a promising frame and try to use Rietveld refinement (see the Cerius2 3.8 Analytical Instruments manual) to test the fit between theory and experiment.

If you find a frame which is among the lowest in energy within the trajectory and for which the comparison measures consistently indicate it is close to the experimental spectrum, there is a good chance that you have identified the correct structure. But additional tests (visually inspecting the spectra, Rietveld refinement, etc.) always need to be carried out to verify the result.

Merging trajectory files

Having clustered, minimized, and re-clustered the output from the Monte Carlo search for a number of space groups, you are left with several trajectory files (one per space group searched) containing structures that are likely to be found experimentally.

Although the structures in each file are unique within that file, it is possible that structures in the output files derived from Monte Carlo searches of different space groups are, in fact, identical. To remove such redundant structures and provide a single file for final analysis and model extraction, you can merge the final trajectory files (using the controls on the Polymorph File Merge control panel) and perform a final cluster analysis.

Note

All the merged trajectory files should contain output derived from Monte Carlo searches of the same asymmetric unit. In particular, the number of rigid units in the asymmetric unit initially searched must be the same.  

Please see the online help for more information on the control panels.

Reliability checking

It is recommended that you confirm the reliability of the search procedures by performing one or more checks on the validity of the final output structures. This section describes a few such tests.

Examine structures/
properties

You may be able to quickly determine the plausibility of a polymorph by examining its structure and basic properties to determine whether their values seem reasonable. Properties to examine include:

Comparison against experimental data

Where experimental data is available, comparison against properties calculated for a predicted structure using Cerius2 computational instruments can be very effective.

For example, take a case in which an experimentally obtained x-ray powder pattern is available. Using the Powder Comparison feature described above, you can compare simulated powder spectra from the predicted structures to the experimental spectrum and identify structures within the trajectory whose powder spectrum is similar to the experimental spectrum. Powder Comparison uses the capabilities of the Cerius2 Diffraction-Crystal module. Good agreement between peak positions and intensities indicates that the predicted structure is accurate. You can then extract likely candidate structures and use Cerius2 Rietveld to adjust the predicted structure to give a good match between the experimental and predicted x-ray powder curves.

Symmetry

If a structure is predicted in a particular symmetry then it should also be predicted in a lower symmetry provided that the lower symmetry has sufficient degrees of freedom (that is, permitted translations, cell angles, cell lengths, and types of symmetry operators) to reproduce the higher symmetry.

For example, for a search in P 21 with one molecule in the asymmetric unit (Z=1) there are two molecules in the unit cell. P 1 symmetry has sufficient degrees of freedom to reproduce a P 21 structure if there are two molecules in the asymmetric unit (Z=2), which in this case will also be the number of molecules in the unit cell. Therefore, a P 1 (Z=2) prediction should find any structures found in a P 21 (Z=1) prediction. However, the P 1 (Z=2) prediction may find many better (that is, lower energy) structures.

Remember that the efficiency of the predictor is expected to degrade with increasing numbers of molecules in the asymmetric unit. You should not expect a P 1 (Z=16) prediction to reproduce a P 4/m m m (Z=1) structure.

Note

In certain cases, for example, if the molecule is not chiral, P 1 (Z=2) should reproduce P -1 (Z=1). However, if the molecule is chiral then P 1 (Z=2) should not reproduce P -1 (Z=1).  

Change the FF dielectric parameters

Current forcefield technology neglects polarization effects. If the stability ranking of predicted structures is not affected by changing the values of the dielectric constant epsilon (used in electrostatic calculations), then the prediction can be assumed reliable. To find out more about the dielectric parameters and recalculating the energies of your structures, see Cerius2 Simulation Tools.


Theory

Predicting polymorphs

Molecules tend to exhibit polymorphism, that is, they crystallize in many modifications, as a result of the physical conditions and the manner in which the crystals are obtained. Different polymorphs possess different values of the Gibbs free energy (G). However, it does not necessarily follow that a crystallization process yields the modification with the lowest value of G. In fact, the relationship between crystallization conditions and the polymorph obtained is not generally understood.

However, with certain limitations, Polymorph Predictor can generate crystal structures highly likely to be found experimentally by using sophisticated simulation algorithms. To find out more about Polymorph Predictor's limitations, see "Polymorph Predictor limitations" on page 132.

Theory

As stated earlier, experiments are likely to find crystal structures with a fairly low value of G. Thermodynamics states that G decomposes into with energy E, pressure p, volume V, temperature T, and entropy S.

The product pV is easily calculated, but can be disregarded as its value is too small to have any noticeable influence on G at meaningful pressures.

Effect of disregarding TS

Different crystal structures have different entropies at finite temperatures. TS therefore has a significant influence on G. However, because it is difficult to compute the entropy of a crystal, TS is not considered. Neglecting this factor changes the calculated energy slightly from the true value of G at finite temperatures. However, these effects are only expected to reorder the relative stabilities of the different low energy structures predicted at zero kelvin -- the lowest energy structures Polymorph Predictor obtains should represent those found experimentally, but probably with different relative stabilities.

This leaves the energy, E, as the quantity that must be optimized. For an arrangement of molecules, and thus also of a crystal as a special case of a regular arrangement, E is efficiently computed using a forcefield. Cerius2 provides advanced forcefield technology through the Open Force Field (OFF) and Force Field Editor (see the Cerius2 Simulation Tools).

The structure of a molecular crystal is efficiently mathematically defined by specifying the content of the unit cell and the values of the unit cell parameters; that is, the lengths of the unit cell vectors (a, b, and c) and the angles between them (, , and ). The content of the unit cell is specified by the coordinates of the asymmetric unit and by the space group.

Therefore, to find the most probable crystal structures, it may at first appear that all possible space groups (230 in total) must be searched with different numbers of molecules in the asymmetric unit, a very time-consuming process. However, the search can be dramatically reduced in scope because of the following factors:

Consequently, one can obtain highly probable molecular crystal structures by determining the most stable structures in just a few space groups and comparing the results in energy and structure. It may be possible to find a particular structure in a space group within the results of a sub group.

Reliability checks

In global optimization problems there is usually no guarantee that the global minimum has been located. In crystal packing problems, these difficulties are worsened since the complete lower part of the energy spectrum must be calculated. Polymorph Predictor provides many adjustable parameters that allow you to fine tune the packing simulation and increase search reliability. However, such increased reliability comes at the cost of dramatically increased processing demands, so it is not generally practical to perform the most detailed search possible.

It is advisable, therefore, to confirm the reliability of the search procedures by performing one or more checks on the validity of the output structures. If the results appear accurate, you know that your search accuracy was adequate; if not, you should adjust the search parameters accordingly. For some recommended reliability checks, see "Reliability checking" on page 156.

Further reading

Please see the following references for a more detailed discussion of the theory and background of the prediction procedures implemented in Polymorph Predictor (Gdanitz 1992, Karfunkel and Gdanitz 1992, Karfunkel and Leusen 1992, Karfunkel et al 1993, Gdanitz et al 1993, Karfunkel et al 1994).

Simulated annealing theory

Polymorph Predictor's Monte Carlo packing algorithm performs a simulated annealing procedure intended to search for the lowest minima of the energy function E of molecular crystals. Such a search is too difficult for classical algorithms for numerous reasons:

Packing simulation difficulties

Solutions provided by simulated annealing

The simulated annealing method works around these difficulties by treating the search for the global minimum of E as a thermodynamic problem. At some non-zero temperature, T, the crystal changes its structure randomly, and its energy fluctuates accordingly. To prevent the simulation from becoming trapped in a local minimum, cooling begins at a relatively high temperature; every crystal structure (within the constraints of the space group and asymmetric unit contents) can then theoretically be reached, and ergodicity therefore ensured.

As the temperature is lowered, the algorithm explores the potential energy hypersurface of the crystal with greater and greater precision. At lower temperatures, movement is restricted to regions where the value of E is relatively low. The simulation ends when T is so low that the crystal is frozen and the global minimum of E is hopefully found. The optimization process discovers other low-lying minima of E, and records the corresponding stable structures.

Automatically comparing powder spectrad

The automatic comparison of powder patterns relies on the availability of a computational measure to indicate "closeness" between powder spectra. Each powder pattern is characterized by a set of intensity values at N observation points k = 1, ..., N, corresponding to a set of diffraction angles. Four different figures of merit have been implemented to enable you to judge how "close" the simulated powder pattern from a given trajectory frame is to the experimental pattern. In each case, a small value indicates a close match and a large value indicates that the powder patterns are very different. The four measures of comparison are:

Eq. 4            

Eq. 5            

Eq. 6            

Eq. 7            

The weights wk at the observation points k are defined via wk = 1/Ik , where Ik = Yk(exp) + Yk(background) is the experimentally observed intensity. Ik must be positive everywhere and Yk(background) should be previously calculated and subtracted from the experimental spectrum using the Load Powder option. The first three of these measures are essentially Rietveld-like measures. If no background is available, wk = 1 is assumed for RWP and R-INDEX. Yk(calc) is the calculated intensity, rescaled such that

Eq. 8            

is minimized.

The CMACS measure has been developed at MSI and has performed well in preliminary tests. In contrast to the Rietveld measures, it does not rely on a significant overlap between peaks in the simulated and experimental spectrum, which for very sharply peaked spectra exists only if the simulated structure is very close to the experimental structure. The spectra are not compared directly, but instead they are integrated first, and these integrated functions are then compared. The comparison is carried out separately on each of a user-specified number of intervals over the display range, and the values obtained are averaged over these intervals.

Other measures quoted in the literature, such as the FOLD algorithm (Karfunkel et al 1993) and the overlap integral (Lawton and Bartell 1994), are all based on artificially broadening the experimental spectrum before carrying out a least squares comparison. Since RWP and R-INDEX are essentially least-squares measures, the FOLD algorithm can be shown to be essentially equivalent to simply broadening the experimental powder pattern before applying the RWP or R-INDEX measure


References

Belsky, V.K. and Zorkii, P.M., Acta. Cryst. A33 1004 (1977).

Gdanitz, R.J., Chem. Phys. Letters, 190, 391 (1992)

Gdanitz, R.J., Karfunkel, H.R., and Leusen, F.J.J., J. Mol. Graphics, 11, 275 (1993)

Karfunkel, H.R. and Gdanitz, R.J., J. Comput. Chem., 13, 1171 (1992)

Karfunkel, H.R. and Leusen, F.J.J., Speedup J., 6/2, 43 (1992)

Karfunkel, H.R., Rohde, B., Leusen, F.J.J., Gdanitz, R.J., and Rihs, G. J. Comput. Chem., 14, 1125 (1993)

Karfunkel, H.R., Leusen, F.J.J., and Gdanitz, R.J., J. Comput.-Aided Mat. Design, 1, 177 (1994).

Kitaigorodsky, A.I., Organic Chemical Crystallography, Consultants Bureau, New York (1961).

Lawton, S.L., and Bartell, L.S., Powder Diffraction 9, 124 (1994)




Last updated December 08, 1998 at 07:28PM Pacific Standard Time.
Copyright © 1998, Molecular Simulations, Inc. All rights reserved.