MSI Product Previous Next Contents Index Top
QSAR



10       Performing Molecular Shape Analysis

Molecular shape analysis (MSA) is a formal approach to incorporating conformational flexibility and shape data into a QSAR. QSARs containing this type of data are commonly called 3D QSARs. The term molecular shape analysis applies to the process described by Hopfinger and Burke (1990)for generating 3D QSARs.

This chapter describes

This chapter provides an overview of MSA in the Drug Discovery Workbench:

Accessing molecular shape analysis (this page)

Overview of molecular shape analysis (page 178)

And a discussion of the steps of the molecular shape analysis process:

1. Generating conformations (page 182)

2. Hypothesizing an active conformer (page 186)

3. Identifying a shape reference compound (page 187)

4. Aligning molecules (page 190)

6. Determining other molecular features (page 193)

7. Generating a trial QSAR (page 193)

This part of the chapter directs you to appropriate chapters in the Cerius2 books where more detailed information about each activity can be found.


Accessing molecular shape analysis

This section describes the procedure for accessing MSA and the things you need to work with the software.

Before you begin

Before you start MSA, you must have a properly licensed copy of the appropriate Cerius2 software, including a copy of QSAR+, installed on your system. If you have any questions about your system setup or your software license, please talk to your system administrator.

You should be familiar with the Cerius2 interface and tools before you begin using MSA. For information on the Cerius2 environment, consult the manuals listed in the preface, How To Use This Book.

Additionally, you must be familiar with the information presented in Chapter 14, Using the Equation Viewer.

To start MSA

Go to the QSAR deck of cards and choose the SHAPE ANALYSIS (MSA) card from the deck.


Overview of molecular shape analysis

This section describes a typical flow of activity and the tasks that make up the MSA process. It provides information to help you use the software effectively and to find more detailed information elsewhere in this and other Cerius2 books.

MSA is an iterative process where steps may be repeated until the molecular shape similarities and other descriptors are checked and adjusted to generate a QSAR equation with optimal statistical significance.

The goal of MSA is to generate a QSAR equation that incorporates spatial molecular similarity data. The process, as described by Hopfinger and Burke, involves seven tasks illustrated in the following figure:

Figure 6 . MSA process


The outcome of the MSA process is an optimized QSAR that can be used for activity estimation and ligand evaluation. The set of choices available for each task is employed to generate trial QSARs. The QSAR that corresponds to the best fit between observed activities and computed molecular descriptors defines the specific requirements for each MSA task.

MSA tasks

The tasks in the MSA process are:

1.   Generate conformers -- The purpose of this task is to generate and analyze conformers for each structure to be investigated, then to reduce the number of conformers to those that are likely to be relevant to biological activity. You can do the analysis automatically as structures are loaded into the study table or at any time after structures are loaded.

In Cerius2, conformation generation can be done in several ways, primarily depending on the size of the molecule and the set of rotatable torsion angles involved. Please see the section 1. Generating conformations on page 182, and Cerius2 Conformational Search and Analysis.

2.   Hypothesize an active conformer -- This part of the process generates a structure that corresponds to the structure present in the rate-limiting step controlling biological action. Typically, this step involves ligand-receptor binding, but it may involve metabolic activation or deactivation, membrane transport, or generation of a transition state.

MSA was developed to treat QSARs in which the geometry of the receptor is unknown. All information about the active conformer must be gleaned from observed biological activities and from corresponding intramolecular conformational propertiescomputed for the ligands. If X-ray binding data are available, they can be used to specify active conformations.

MSA provides a variety of methods for identifying possible active conformers. Please see the section 2. Hypothesizing an active conformer on page 186.

3.   Select a candidate shape reference compound -- The shape reference compound is the molecule that is used when shape descriptors are calculated for the study table. To select the reference compound, each compound in the data set is tested in one or more possible active conformations.

MSA compares all other molecules in the study table to the shape reference compound and provides information about each comparison. The criterion for selecting the shape reference compound is to optimize the statistical significance of the corresponding QSAR. Please see the section 3. Identifying a shape reference compound on page 187.

4.   Perform pair-wise molecular shape superpositions -- MSA requires that each compound in the range of active candidate conformations be aligned and compared with the shape reference compound. The fourth step in MSA is to perform pair- wise molecular superpositions to determine what and how atoms of data set compounds are equivalent to atoms in the shape reference compound.

MSA provides several methods to align entire structures, as well as to select certain atoms to be aligned. Please see the section 4. Aligning molecules on page 190.

5.   Measure molecular shape commonality -- Calculate shape descriptors to compare the properties that two molecules have in common and to measure molecular shape commonality.

Briefly, to select and calculate shape descriptors:

a.   Choose the Descriptors/Select... menu item in the study table to select descriptors with the Descriptors control panel.

b.   Set the Descriptor family popup to MSA and then click the button on the extreme left side. The MSA descriptors are displayed in the table.

c.   Select shape descriptors by clicking the appropriate row in the first column.

d.   Click the ADD button to add the descriptor to the study table.

For more detailed information on this step, see Chapter 7, Working with Descriptors.

6.   Determine other molecular features -- You can also add other molecular properties to the QSAR by calculating non-shape descriptors. Included in QSAR+ are a wide variety of spatial, electronic, and thermodynamic descriptors that constitute possible additional sources of features that govern biological activity. In this step, you select the specific descriptors that are appropriate for your study and add them to the study table. For information on descriptors, see Chapter 7, Working with Descriptors .

7.   Construct a trial QSAR -- The final step in MSA is to construct a trial QSAR. You select the data that you want to include in the QSAR equation by specifying the structures to be included and the conformation to use for each structure. Please see the section 7. Generating a trial QSAR on page 193.

A unique aspect of MSA in generating a QSAR is that, not only are combinations of descriptor sets considered in optimizing the statistical significance of the QSAR, but the QSAR is also optimized for each of Steps 2 through 6. This is to optimize the molecular shape similarity and commonality contributions of the QSAR and the contribution of other descriptors. Optimization requires the steps to be iterative, as indicated in the figure summarizing the MSA process on page 179. For information on constructing a QSAR, see Chapter 2, QSAR+ QuickStart.

To deal with conformations, the study table for an MSA QSAR has a third dimension. QSAR+ calculates and stores shape and non-shape descriptor values for each conformer generated from the structures in the study table. The information is stored in a separate Conformations table created for each structure. You determine the set of conformer data that you want to use by specifying the conformer to be used in the study table. Typically, the dataset is the set most similar to the shape reference.


1. Generating conformations

This section describes the process for generating and analyzing conformers of the structures used as the training set for constructing a 3D QSAR. This is the first step in the MSA process described on Overview of molecular shape analysis on page 178. Structures used to generate a 3D QSAR are assumed to be congeneric.

Many descriptors that QSAR+ calculates depend on the 3D structure of a molecule. Shape descriptors used in MSA also depend on conformation. Therefore, when you are constructing a 3D QSAR equation, you may want to generate conformations before calculating descriptors. In doing this, you assure that conformation is a factor in generating the equation.

This section describes

Setting conformation generation preferences (page 183)

Generating conformations (page 185)

Before you begin

Conformers should be generated using structures that have been subjected to energy minimization to generate a low-energy conformation for each structure. The simple way to be sure the structures in your study table are minimized is to check the Minimize Energy checkbox (Molecule Preferences control panel), so that all structures are minimized as they are loaded into the study table. Please see the Working with Molecules chapter.

Alternatively, you can request that all structures be minimized by checking the Minimize Structures before Generating Conformers checkbox on the QSAR Conformation Generation control panel.

Setting conformation generation preferences

This section describes conformation generation preferences and the procedure for setting them. Four activities are described in this section:

Accessing the QSAR Conformation Generation control panel (this page).

Selecting a conformation generation method (page 183).

Applying an energy cutoff (page 184).

Specifying the number of conformers to be generated (page 185).

Use this portion of MSA if you are working with 3D descriptors and you want conformation to be a factor in generating a QSAR equation.

Accessing the QSAR Conformation Generation control panel

To open the QSAR Conformation Generation control panel, go to the SHAPE ANALYSIS (MSA) card in the QSAR deck and select Generate Conformations. The QSAR Conformation Generation control panel appears.

Alternatively, select the Preferences/Molecules... menu item, then click the Conformations button in the Molecule Preferences control panel.

Selecting a conformation generation method

The QSAR Conformation Generation control panel offers two alternatives to specify settings for generating conformers. If you check the Conformer Search Module checkbox, conformers are generated using the settings you have specified in Cerius2 Conformer Search. For information on Conformational Search, see Cerius2 Conformational Search and Analysis.

Checking the This Panel checkbox allows you to use the QSAR Conformation Generation control panel to specify settings for generating conformers.

To select a method for conformational search

Check the Use Optimal Search Method checkbox or the Use checkbox to specify a conformational search method. If you select Use Optimal Search Method, MSA uses the best method to apply to the structures in the study table to generate the lowest-energy conformers.

If you want to select a particular method to generate conformers, check the Use checkbox. Available methods include Grid Scan, Random Sample, and Boltzmann Jump.

A brief description of each available conformational search method is provided below. For more information, see Cerius2 Conformational Search and Analysis.

Grid Scan -- This method is used to perform a simple systematic search in which each specified torsion angle is varied over a grid of equally spaced values.

Random Sample -- In this method, the starting conformation of a structure is perturbed by randomly altering values of all variable torsion angles. Each angle is assigned a value within a specified torsion angle window.

Boltzmann Jump -- In this method, torsion angles of a molecule are randomly altered within a specified angle window. After each random move, the Metropolis method is used to accept or reject the move.

The conformational search method that you select can be applied automatically when you add structures to the study table or at any time.

As part of the conformer-generation process, conformers can be minimized. Select this option by checking the Minimize Conformers checkbox.

You can choose to save only unique conformers by checking the Retain Unique Conformers checkbox.

The conformers that are generated, minimized, and retained as unique are located at energy minima.

Applying an energy cutoff

You can specify a maximum energy value for conformers. Click the box next to the option you select. The options are:

If you select Relative or Absolute, use the slider or enter an appropriate value in the Cutoff entry box.

Specifying the number of conformers

You can specify the maximum number of conformers to be generated. Use the slider or enter the value in the Generate No More Than... entry box.

Generating conformations

Conformers can be generated automatically when you add structures to the study table or whenever you want.

To generate conformers when you have completed specifying conformer-generation settings, click the Generate Conformers button at the top of the QSAR Conformation Generation control panel. Choose from the popup to generate conformers for Current, Selected, or All molecules in the study table.

For an alternative way to generate conformers for structures already loaded in the study table, click Generate Conformers in the QSAR Conformation Generation control panel. To open the Conformation Generation control panel, click Generate Conformations on the MSA card in the QSAR deck.

To specify conformer generation as a default when structures are added to the study table, check the Generate Conformers checkbox in the Molecule Preferences control panel. To open the Molecule Preferences panel, select the Preferences/Molecules menu item in the study table.

For more information on the latter two options, see Chapter 6, Working with Molecules.


2. Hypothesizing an active conformer

Selecting and displaying an active conformer is the second step in the MSA process. The goal is to select the structure that is present in the rate-limiting step that controls activity in a biological reaction.

This section describes

Accessing the Active Conformation control panel (this page)

Selecting the active conformer (page 186)

Displaying the active conformer (page 187)

Before you begin

Before you begin this step, you must do the following:

Accessing the Active Conformation control panel

To open the Active Conformation control panel, choose Active Conformation from the SHAPE ANALYSIS (MSA) card.

Selecting the active conformer

Several criteria are available for selecting an active conformer. Before selecting either of the first two criteria, override the default criterion for selecting an active conformer. For more information, see To choose a conformer, below.

Global Minimum of Most Active -- MSA looks at the global minimum of the most active compound in the study table (based on the value in the Activity column) and makes it the active conformer.

Selected Study Molecule -- Specify that the selectedconformer be the active conformer.

To choose a conformer

1.   Before you select a method for identifying an active conformer, you may need to override the default value that MSA uses to determine the most active molecule.

This value is set in the QSAR Preferences control panel with the Bigger Activity Values Indicate Greater Activity checkbox. The QSAR Preferences control panel, is opened by selecting the QSAR Preferences/General... menu item;

To specify that higher numbers indicate greater activity, check the Bigger Activity Values Indicate Greater Activity checkbo.

To specify that smaller numbers indicate greater activity, uncheck the Bigger Activity Values Indicate Greater Activity checkbox.

2.   After indicating how MSA should rank activities, select the method to identify an active conformer by checking the box next to the criterion in the Active Conformation control panel.

3.   Click the Select Active Conformer button.

MSA selects the active conformer using the criterion you specified. The name and the activity value (listed in the study table) of the conformer are shown in the Name and Activity boxes of the Active Conformation control panel.

Displaying the active conformer

This section describes how to display the active conformer. Before performing this activity, you must select the active conformer. For more information, see the previous section.

To display the active conformer

Click the Display Active Conformer button on the Active Conformation control panel.

The active conformer is displayed in the model window.


3. Identifying a shape reference compound

This section describes activities to select and display a shape reference compound. This is the third step in the MSA process. The goal is to identify a compound to be used when shape descriptors are calculated for the study table.

This section describes

Accessing the Shape Reference control panel (this page)

Selecting a shape reference compound (page 188)

Displaying the selected shape reference compound (page 189)

Before you begin

Before you begin this step, you must identify an active conformer. For more information on this activity, see the section 2. Hypothesizing an active conformer on page 186.

Accessing the Shape Reference control panel

To open the Shape Reference control panel, choose Shape Reference from the SHAPE ANALYSIS (MSA) card.

Selecting a shape reference compound

Five criteria are available when you select a shape reference compound:

Active Conformation -- MSA selects a conformer of the most active Study table molecule. The conformer is the one most likely to result in the measured activity (that is, the active conformer).

Largest Molecule (Volume) -- MSA selects the molecule in the study table with the largest volume to be the shape reference compound.

Largest Molecule (Surface Area) -- MSA selects the molecule in the study table with the largest surface area to be the shape reference compound.

Global Minimum of Most Active -- MSA selects the most active molecule in the study table to be the shape reference compound and minimizes that structure. The conformer that is selected is the lowest-energy conformer. Before you use this criterion, you must override the default selection method for selecting the most active molecule.

Selected Study Molecule -- Manually specify a molecule from the study table to be your shape reference compound. For information on how to make selections, see Chapter 5, Working with the Study Table.

To choose a shape reference compound

1.   Before you select a criterion for identifying a shape reference compound, you may need to override the default value that MSA uses to determine the most active molecule. This value is set in the QSAR Preferences control panel, but it can be changed in the Shape Reference control panel. A change of the function on either panel is reflected in both panels. For more information, see Setting molecule-processing preferences on page 113.

To specify that larger numbers indicate greater activity, check the Bigger Activity Values Indicate Greater Activity checkbox at the bottom of the Shape Reference control panel.

To specify that smaller numbers indicate greater activity, uncheck the Bigger Activity Values Indicate Greater Activity checkbox at the bottom of the Shape Reference control panel

2.   Select the criterion to identify a shape reference compound by clicking the box next to the criterion in the Shape Reference control panel.

3.   Click the Select Shape Reference button.

MSA selects the shape reference compound using the criterion you specified. The name, volume, surface area, and activity value of the compound are displayed in the Name, Volume, Surface Area, and Activity boxes in the Shape Reference control panel.

Displaying the selected shape reference compound

This section describes how to display the shape reference compound. Before performing this activity, you must select a shape reference compound as described in the previous section, Selecting a shape reference compound.

To display the shape reference compound

Click the Display Shape Reference button on the Shape Reference control panel. The model window changes to border mode, and the selected shape reference compound is displayed in the main part of the window.

MSA highlights the rows in the study table and in the Conformers table that contain information about the shape reference compound.


4. Aligning molecules

This section describes activities to perform pair-wise molecular superpositions of the shape reference compound with all structures. This is the fourth step in the MSA process. The goal is to determine what and how atoms in the data set compounds are equivalent to atoms in the shape reference compound.

This section describes

Accessing the Shape Reference control panel (this page)

Aligning models (page 190)

Removing alignment information (page 193)

Before you begin

If you do not select a shape reference compound before you perform alignment activities, MSA automatically identifies a shape reference compound using the default selection method identified in the top portion of the Shape Reference control panel. For more information on selecting a shape reference compound, see the section 3. Identifying a shape reference compound on page 187.

Accessing the Shape Reference control panel

To open the Shape Reference control panel, choose Shape Reference from the SHAPE ANALYSIS (MSA) card.

Aligning models

Alignment of structures through pair-wise atom superpositioning places all structures in the study table in the same frame of reference as the shape reference compound. The methods available for aligning models are MCS (maximum common subgraph) and CSS (core substructure search).

The MCS method looks at molecules as points and lines and uses the techniques of graph theory to identify patterns. It finds the largest subset of atoms in the shape reference compound that is shared by all the structures in the study table and uses this subset for alignment.

The CSS method starts with defining a core model, which is a substructure to find and match in all your selected models. The core model itself is just a Cerius2 model regarded as composed of core atoms and substitution sites. Core atoms are the atoms in your core model that exactly match a substructure in your align models, and substitution sites represent sites where the core model and the matched align models differ.

MCS and CSS compared

MCS is a general procedure for finding a common substructure between two models (and hence for defining atom matches), but the underlying algorithm is based on an exhaustive tree search and can take a significant amount of time for models that have a highly branched structure, e.g., fused ring systems.

CSS has the disadvantage that you have to specify a suitable core model that you know is a common substructure for all your align models. However, a CSS search is generally much faster than a MCS search and gives much more control over the resultant atom matches. In addition, because each of your align models is first matched to a single core model (these matches being stored internally), matching all align models to each other takes only a little longer than matching all align models to a single target model. (With MCS each pair of align models is matched independently.)

Aligning models by MCS method

1.   Determine what portion of the shape reference compound (target) you want to use to align the study table structures by setting the Align to Targets Using popup in the Shape Reference control panel to ALL or SELECTED.

2.   Determine what structures you want to align to the shape reference compound by setting the Align Target Molecule(s) popup to ALL structures, SELECTED structures, or only the CURRENT structure.

3.   Check the Overlay Aligned Molecules check box if you want molecules to be displayed in overlay mode.

4.   Click the ALIGN pushbutton.

MSA calculates alignment information for every selected structure in the study table except the shape reference compound. This information consists of atom pairings that describe how each structure matches the shape reference compound. MSA performs a rigid fit to superimpose each structure so that it overlays the shape reference compound.

If you specified overlay mode for the display, the results of the alignment are displayed in the model window. If you checked Recalculate Descriptors After Align, these calculations are completed automatically after the alignment is complete.

You can also align models by selecting Align Models from the ALIGN MOLECULES card in the DRUG DISCOVERY card deck. From the Align Models control panel, you can perform both MCS and CSS alignment.

Please see the online help for additional information.

Aligning models by CSS method

To perform CSS matching, select Align Models from the ALIGN MOLECULES card in the DRUG DISCOVERY card deck to open the Align Models control panel.

Clicking the DEFINE button specifies that the current model is the CSS core model. Clicking the Match Atoms using CSS Search button starts the process of atom matching for the selected models. The effect of the matching is the same as for MCS matching, i.e., a set of editable atom pairs is displayed in the Viewer window and the number of atom matches is displayed in the Align Models table.

Core atoms and substitution sites in your core model are identified according to the RMS Align setting in the Align Preferences control panel:

Removing alignment information

You can delete alignment information for some or all the structures in the study table, if you want to realign structures using another alignment method.

To delete alignment information

1.   Specify the molecules for which you want to remove alignment information. You can remove information for all, selected, or current molecules. Make your selection from the popup.

2.   Click Clear Alignment information from... Molecule(s).

All alignment information for the specified structures is deleted.


5. Measure molecular shape commonality

Selecting and calculating shape descriptors is described in Chapter 7, Working with Descriptors.


6. Determining other molecular features

Adding nonshape descriptors to the study table is described in Chapter 7, Working with Descriptors.


7. Generating a trial QSAR

This section describes the activities involved in selecting training structure data and using that data to generate a trial QSAR equation. In Step 1 of the MSA process, you generate a set of conformers for each structure in the study table. Information about each conformer is stored in a conformers table associated with each study table structure. In selecting a dataset for generating a QSAR equation, all the data, both in the study table and in the conformers table, are considered.

After you complete the QSAR-generation process and examine the equation that is generated, you can repeat the entire process or any portion of it as described in Overview of molecular shape analysis on page 178. The outcome of the process is an optimized QSAR equation that corresponds to the best fit between observed activities and computed molecular descriptor data.

This section describes

Accessing the Select Conformers control panel (page 194)

Selecting conformers based on a specified set of descriptor data (page 194)

Generating a trial QSAR equation (page 195)

Before you begin

The information in this section assumes that you have completed Steps 1 through 6 (above) of the MSA process, including generating shape and nonshape descriptors for all molecules in the study table.

Accessing the Select Conformers control panel

To open the Select Conformers control panel, select Select Conformers from the SHAPE ANALYSIS (MSA) menu card.

Selecting conformers

Select appropriate conformers to use in generating a 3D QSAR. You can use both shape and nonshape 3D descriptors and apply the selection process to some or all of the structures in the study table.

The selection process examines all descriptor data for each conformer to determine the conformer with data that best match the shape reference data. The data are evaluated using the partial least squares regression method (pls). Since regression analysis is run for all specified conformers and descriptors, the conformer selection process can be lengthy.

To select conformers

1.   Determine which descriptors should be used by MSA to identify the significant conformation for each study table molecule. Indicate your choice by clicking the All 3D Descriptors radio box or the Shape Descriptors radio box.

2.   Determine what study table structures should be involved in the conformer-selection process by choosing All, Selected, or Current from the Select Conformers for popup.

3.   Decide if conformation-dependent properties should be recalculated before selecting significant conformations. Check the Regenerate Conformer Tables checkbox if you want recalculation performed.

4.   Decide if you want MSA to generate a QSAR when the conformer-selection process is complete. If so, check the Calculate QSAR When Done checkbox.

5.   Click Select to start the selection process.

Based on the choices you have made, MSA recalculates conformation-dependent properties for all specified structures. Then, using the specified descriptor information, it selects the significant conformer for each of the specified structures. The study table is updated with information about the selected conformer for each structure. The process may be lengthy if a large number of structures and conformers are involved.

Generating a trial QSAR equation

When the process of selecting conformers is complete, you are ready to generate a trial QSAR equation. The equation can be modified as often as you want by repeating some or all of the MSA steps until you are satisfied with the equation that is generated.

To generate a trial QSAR equation

You can generate a QSAR equation using one of several options:

or:



MSI Product Previous Next Contents Index Top

Last updated May 18, 2000 at 05:51PM Pacific Daylight Time.
Copyright © 2000, Molecular Simulations Inc. All rights reserved.