| QSAR |

This chapter provides an overview of MSA in the Drug Discovery Workbench:
And a discussion of the steps of the molecular shape analysis process:

Before you start MSA, you must have a properly licensed copy of the appropriate Cerius2 software, including a copy of QSAR+, installed on your system. If you have any questions about your system setup or your software license, please talk to your system administrator.
You should be familiar with the Cerius2 interface and tools before you begin using MSA. For information on the Cerius2 environment, consult the manuals listed in the preface, How To Use This Book.
Additionally, you must be familiar with the information presented in Chapter 14, Using the Equation Viewer.
Go to the QSAR deck of cards and choose the SHAPE ANALYSIS (MSA) card from the deck.
This section describes a typical flow of activity and the tasks that make up the MSA process. It provides information to help you use the software effectively and to find more detailed information elsewhere in this and other Cerius2 books.
Overview of molecular shape analysis
MSA is an iterative process where steps may be repeated until the molecular shape similarities and other descriptors are checked and adjusted to generate a QSAR equation with optimal statistical significance.
The goal of MSA is to generate a QSAR equation that incorporates spatial molecular similarity data. The process, as described by Hopfinger and Burke, involves seven tasks illustrated in the following figure:
|
The outcome of the MSA process is an optimized QSAR that can be used for activity estimation and ligand evaluation. The set of choices available for each task is employed to generate trial QSARs. The QSAR that corresponds to the best fit between observed activities and computed molecular descriptors defines the specific requirements for each MSA task.
The tasks in the MSA process are:
2. Hypothesize an active conformer -- This part of the process generates a structure that corresponds to the structure present in the rate-limiting step controlling biological action. Typically, this step involves ligand-receptor binding, but it may involve metabolic activation or deactivation, membrane transport, or generation of a transition state.
3. Select a candidate shape reference compound -- The shape reference compound is the molecule that is used when shape descriptors are calculated for the study table. To select the reference compound, each compound in the data set is tested in one or more possible active conformations.
4. Perform pair-wise molecular shape superpositions -- MSA requires that each compound in the range of active candidate conformations be aligned and compared with the shape reference compound. The fourth step in MSA is to perform pair- wise molecular superpositions to determine what and how atoms of data set compounds are equivalent to atoms in the shape reference compound.
5. Measure molecular shape commonality -- Calculate shape descriptors to compare the properties that two molecules have in common and to measure molecular shape commonality.
a. Choose the Descriptors/Select... menu item in the study table to select descriptors with the Descriptors control panel.
c. Select shape descriptors by clicking the appropriate row in
the first column.
d. Click the ADD button to add the descriptor to the study
table.
6. Determine other molecular features -- You can also add other molecular properties to the QSAR by calculating non-shape descriptors. Included in QSAR+ are a wide variety of spatial, electronic, and thermodynamic descriptors that constitute possible additional sources of features that govern biological activity. In this step, you select the specific descriptors that are appropriate for your study and add them to the study table. For information on descriptors, see Chapter 7, Working with Descriptors .
7. Construct a trial QSAR -- The final step in MSA is to construct
a trial QSAR. You select the data that you want to include in the
QSAR equation by specifying the structures to be included and
the conformation to use for each structure. Please see the section
7. Generating a trial QSAR on page 193.
A unique aspect of MSA in generating a QSAR is that, not only are combinations of descriptor sets considered in optimizing the statistical significance of the QSAR, but the QSAR is also optimized for each of Steps 2 through 6. This is to optimize the molecular shape similarity and commonality contributions of the QSAR and the contribution of other descriptors. Optimization requires the steps to be iterative, as indicated in the figure summarizing the MSA process on page 179. For information on constructing a QSAR, see Chapter 2, QSAR+ QuickStart.
To deal with conformations, the study table for an MSA QSAR has a third dimension. QSAR+ calculates and stores shape and non-shape descriptor values for each conformer generated from the structures in the study table. The information is stored in a separate Conformations table created for each structure. You determine the set of conformer data that you want to use by specifying the conformer to be used in the study table. Typically, the dataset is the set most similar to the shape reference.
This section describes the process for generating and analyzing conformers of the structures used as the training set for constructing a 3D QSAR. This is the first step in the MSA process described on Overview of molecular shape analysis on page 178. Structures used to generate a 3D QSAR are assumed to be congeneric.
1. Generating conformations
Many descriptors that QSAR+ calculates depend on the 3D structure of a molecule. Shape descriptors used in MSA also depend on conformation. Therefore, when you are constructing a 3D QSAR equation, you may want to generate conformations before calculating descriptors. In doing this, you assure that conformation is a factor in generating the equation.
Conformers should be generated using structures that have been subjected to energy minimization to generate a low-energy conformation for each structure. The simple way to be sure the structures in your study table are minimized is to check the Minimize Energy checkbox (Molecule Preferences control panel), so that all structures are minimized as they are loaded into the study table. Please see the Working with Molecules chapter.
Alternatively, you can request that all structures be minimized by checking the Minimize Structures before Generating Conformers checkbox on the QSAR Conformation Generation control panel.
Setting conformation generation preferences
This section describes conformation generation preferences and the procedure for setting them. Four activities are described in this section:
Accessing the QSAR Conformation Generation control panel
To open the QSAR Conformation Generation control panel, go to the SHAPE ANALYSIS (MSA) card in the QSAR deck and select Generate Conformations. The QSAR Conformation Generation control panel appears.
Alternatively, select the Preferences/Molecules... menu item, then click the Conformations button in the Molecule Preferences control panel.
Selecting a conformation generation method
The QSAR Conformation Generation control panel offers two alternatives to specify settings for generating conformers. If you check the Conformer Search Module checkbox, conformers are generated using the settings you have specified in Cerius2 Conformer Search. For information on Conformational Search, see Cerius2 Conformational Search and Analysis.
Checking the This Panel checkbox allows you to use the QSAR Conformation Generation control panel to specify settings for generating conformers.
To select a method for conformational search
Check the Use Optimal Search Method checkbox or the Use checkbox to specify a conformational search method. If you select Use Optimal Search Method, MSA uses the best method to apply to the structures in the study table to generate the lowest-energy conformers.
If you want to select a particular method to generate conformers, check the Use checkbox. Available methods include Grid Scan, Random Sample, and Boltzmann Jump.
A brief description of each available conformational search method is provided below. For more information, see Cerius2 Conformational Search and Analysis.
As part of the conformer-generation process, conformers can be minimized. Select this option by checking the Minimize Conformers checkbox.
You can choose to save only unique conformers by checking the Retain Unique Conformers checkbox.
The conformers that are generated, minimized, and retained as unique are located at energy minima.
Applying an energy cutoff
You can specify a maximum energy value for conformers. Click the box next to the option you select. The options are:
Specifying the number of conformers
You can specify the maximum number of conformers to be generated. Use the slider or enter the value in the Generate No More Than... entry box.
Generating conformations
Conformers can be generated automatically when you add structures to the study table or whenever you want.
To generate conformers when you have completed specifying conformer-generation settings, click the Generate Conformers button at the top of the QSAR Conformation Generation control panel. Choose from the popup to generate conformers for Current, Selected, or All molecules in the study table.
For an alternative way to generate conformers for structures already loaded in the study table, click Generate Conformers in the QSAR Conformation Generation control panel. To open the Conformation Generation control panel, click Generate Conformations on the MSA card in the QSAR deck.
To specify conformer generation as a default when structures are added to the study table, check the Generate Conformers checkbox in the Molecule Preferences control panel. To open the Molecule Preferences panel, select the Preferences/Molecules menu item in the study table.
For more information on the latter two options, see Chapter 6, Working with Molecules.
Selecting and displaying an active conformer is the second step in the MSA process. The goal is to select the structure that is present in the rate-limiting step that controls activity in a biological reaction.
2. Hypothesizing an active conformer
Before you begin this step, you must do the following:
Selecting the active conformer
Several criteria are available for selecting an active conformer. Before selecting either of the first two criteria, override the default criterion for selecting an active conformer. For more information, see To choose a conformer, below.
2. After indicating how MSA should rank activities, select the method to identify an active conformer by checking the box next to the criterion in the Active Conformation control panel.
3. Click the Select Active Conformer button.
MSA selects the active conformer using the criterion you specified. The name and the activity value (listed in the study table) of the conformer are shown in the Name and Activity boxes of the Active Conformation control panel.
Displaying the active conformer
This section describes how to display the active conformer. Before performing this activity, you must select the active conformer. For more information, see the previous section.
To display the active conformer
Click the Display Active Conformer button on the Active Conformation control panel.
The active conformer is displayed in the model window.
This section describes activities to select and display a shape reference compound. This is the third step in the MSA process. The goal is to identify a compound to be used when shape descriptors are calculated for the study table.
3. Identifying a shape reference compound
Before you begin this step, you must identify an active conformer. For more information on this activity, see the section 2. Hypothesizing an active conformer on page 186.
Accessing the Shape Reference control panel
To open the Shape Reference control panel, choose Shape Reference from the SHAPE ANALYSIS (MSA) card.
Selecting a shape reference compound
Five criteria are available when you select a shape reference compound:
Active Conformation -- MSA selects a conformer of the most active Study table molecule. The conformer is the one most likely to result in the measured activity (that is, the active conformer).
Largest Molecule (Volume) -- MSA selects the molecule in the study table with the largest volume to be the shape reference compound.
Largest Molecule (Surface Area) -- MSA selects the molecule in the study table with the largest surface area to be the shape reference compound.
Global Minimum of Most Active -- MSA selects the most active molecule in the study table to be the shape reference compound and minimizes that structure. The conformer that is selected is the lowest-energy conformer. Before you use this criterion, you must override the default selection method for selecting the most active molecule.
Selected Study Molecule -- Manually specify a molecule from the study table to be your shape reference compound. For information on how to make selections, see Chapter 5, Working with the Study Table.
To choose a shape reference compound
1. Before you select a criterion for identifying a shape reference
compound, you may need to override the default value that
MSA uses to determine the most active molecule. This value is
set in the QSAR Preferences control panel, but it can be changed
in the Shape Reference control panel. A change of the function
on either panel is reflected in both panels. For more information,
see Setting molecule-processing preferences on page 113.
2. Select the criterion to identify a shape reference compound by clicking the box next to the criterion in the Shape Reference control panel.
3. Click the Select Shape Reference button.
MSA selects the shape reference compound using the criterion you specified. The name, volume, surface area, and activity value of the compound are displayed in the Name, Volume, Surface Area, and Activity boxes in the Shape Reference control panel.
Displaying the selected shape reference compound
This section describes how to display the shape reference compound. Before performing this activity, you must select a shape reference compound as described in the previous section, Selecting a shape reference compound.
To display the shape reference compound
Click the Display Shape Reference button on the Shape Reference control panel. The model window changes to border mode, and the selected shape reference compound is displayed in the main part of the window.
MSA highlights the rows in the study table and in the Conformers table that contain information about the shape reference compound.
This section describes activities to perform pair-wise molecular superpositions of the shape reference compound with all structures. This is the fourth step in the MSA process. The goal is to determine what and how atoms in the data set compounds are equivalent to atoms in the shape reference compound.
4. Aligning molecules
If you do not select a shape reference compound before you perform alignment activities, MSA automatically identifies a shape reference compound using the default selection method identified in the top portion of the Shape Reference control panel. For more information on selecting a shape reference compound, see the section 3. Identifying a shape reference compound on page 187.
Accessing the Shape Reference control panel
To open the Shape Reference control panel, choose Shape Reference from the SHAPE ANALYSIS (MSA) card.
Aligning models
Alignment of structures through pair-wise atom superpositioning places all structures in the study table in the same frame of reference as the shape reference compound. The methods available for aligning models are MCS (maximum common subgraph) and CSS (core substructure search).
The MCS method looks at molecules as points and lines and uses the techniques of graph theory to identify patterns. It finds the largest subset of atoms in the shape reference compound that is shared by all the structures in the study table and uses this subset for alignment.
The CSS method starts with defining a core model, which is a substructure to find and match in all your selected models. The core model itself is just a Cerius2 model regarded as composed of core atoms and substitution sites. Core atoms are the atoms in your core model that exactly match a substructure in your align models, and substitution sites represent sites where the core model and the matched align models differ.
MCS is a general procedure for finding a common substructure between two models (and hence for defining atom matches), but the underlying algorithm is based on an exhaustive tree search and can take a significant amount of time for models that have a highly branched structure, e.g., fused ring systems.
CSS has the disadvantage that you have to specify a suitable core model that you know is a common substructure for all your align models. However, a CSS search is generally much faster than a MCS search and gives much more control over the resultant atom matches. In addition, because each of your align models is first matched to a single core model (these matches being stored internally), matching all align models to each other takes only a little longer than matching all align models to a single target model. (With MCS each pair of align models is matched independently.)
Aligning models by MCS method
1. Determine what portion of the shape reference compound (target)
you want to use to align the study table structures by setting
the Align to Targets Using popup in the Shape Reference
control panel to ALL or SELECTED.
4. Click the ALIGN pushbutton.
MSA calculates alignment information for every selected structure in the study table except the shape reference compound. This information consists of atom pairings that describe how each structure matches the shape reference compound. MSA performs a rigid fit to superimpose each structure so that it overlays the shape reference compound.
If you specified overlay mode for the display, the results of the alignment are displayed in the model window. If you checked Recalculate Descriptors After Align, these calculations are completed automatically after the alignment is complete.
You can also align models by selecting Align Models from the ALIGN MOLECULES card in the DRUG DISCOVERY card deck. From the Align Models control panel, you can perform both MCS and CSS alignment.
Please see the online help for additional information.
Aligning models by CSS method
To perform CSS matching, select Align Models from the ALIGN MOLECULES card in the DRUG DISCOVERY card deck to open the Align Models control panel.
Clicking the DEFINE button specifies that the current model is the CSS core model. Clicking the Match Atoms using CSS Search button starts the process of atom matching for the selected models. The effect of the matching is the same as for MCS matching, i.e., a set of editable atom pairs is displayed in the Viewer window and the number of atom matches is displayed in the Align Models table.
Core atoms and substitution sites in your core model are identified according to the RMS Align setting in the Align Preferences control panel:
To delete alignment information
2. Click Clear Alignment information from... Molecule(s).
All alignment information for the specified structures is deleted.
Selecting and calculating shape descriptors is described in Chapter 7, Working with Descriptors. 
5. Measure molecular shape commonality
Adding nonshape descriptors to the study table is described in Chapter 7, Working with Descriptors. 
6. Determining other molecular features
This section describes the activities involved in selecting training structure data and using that data to generate a trial QSAR equation. In Step 1 of the MSA process, you generate a set of conformers for each structure in the study table. Information about each conformer is stored in a conformers table associated with each study table structure. In selecting a dataset for generating a QSAR equation, all the data, both in the study table and in the conformers table, are considered.
7. Generating a trial QSAR
After you complete the QSAR-generation process and examine the equation that is generated, you can repeat the entire process or any portion of it as described in Overview of molecular shape analysis on page 178. The outcome of the process is an optimized QSAR equation that corresponds to the best fit between observed activities and computed molecular descriptor data.
The information in this section assumes that you have completed Steps 1 through 6 (above) of the MSA process, including generating shape and nonshape descriptors for all molecules in the study table.
Accessing the Select Conformers control panel
To open the Select Conformers control panel, select Select Conformers from the SHAPE ANALYSIS (MSA) menu card.
Selecting conformers
Select appropriate conformers to use in generating a 3D QSAR. You can use both shape and nonshape 3D descriptors and apply the selection process to some or all of the structures in the study table.
The selection process examines all descriptor data for each conformer to determine the conformer with data that best match the shape reference data. The data are evaluated using the partial least squares regression method (pls). Since regression analysis is run for all specified conformers and descriptors, the conformer selection process can be lengthy.
5. Click Select to start the selection process.
Based on the choices you have made, MSA recalculates conformation-dependent properties for all specified structures. Then, using the specified descriptor information, it selects the significant conformer for each of the specified structures. The study table is updated with information about the selected conformer for each structure. The process may be lengthy if a large number of structures and conformers are involved.
Generating a trial QSAR equation
When the process of selecting conformers is complete, you are ready to generate a trial QSAR equation. The equation can be modified as often as you want by repeating some or all of the MSA steps until you are satisfied with the equation that is generated.
To generate a trial QSAR equation
You can generate a QSAR equation using one of several options: