MSI Product Previous Next Contents Index Top
Combi-Chem


3d. Compound selection procedures

Back to index of Tutorials.

Identification of outliers

1.   Start a new Cerius2 session

If you have just finished the previous section, start a new session by selecting the File/New Session item from the Visualizer menu bar. Click Confirm in the Re-initialize All message window. You may also start a brand new Cerius2 session.

2.   Open an empty study table

Go to the LIBRARY ANALYSIS card in the COMBI-CHEM I deck and select the Show Study Table item.

3.   Recall the benzodiazepine library

Open the Open Study Table control panel by selecting the File/Open... item on the menu bar in the Study Table control panel. Select the table benzo.tbl and click the OPEN pushbutton. Alternatively, if you did not complete the earlier part of the tutorial, select the table ./Cerius2-Resources/COMBICHEM/demos/benzo_720.tbl and click the OPEN pushbutton.

4.   Perform a PCA analysis

Run a simple PCA analysis as explained in the previous example, Using principal component analysis. Remember to set Label Row by to Star in the 3D Plot Samples control panel before starting the analysis.

If you forgot to set the plot options, you can re-plot the compounds using the same 3D Plot tool.

5.   Identify outliers in PCA space

Now that the principal components are identified as independent variables, they will be used in the outlier identification experiment in place of the original descriptors.

Go to the LIBRARY ANALYSIS card and select the Property Space Outliers item to open the Property Space Outliers control panel.

Outliers are identified as those molecular models whose distance to the centroid defined by the full set of models is greater than the average distance to the centroid plus a user-defined number of times the standard deviation. This parameter is specified in the Maximum Deviation entry box in the Property Space Outliers control panel.

Click the Identify Outliers in Property Space action button on the Property Space Outliers control panel .

6.   Visualize outlier compounds

The number of outliers (4) is listed in the text window. The outlier-compound rows are selected in the study table and their points are red in the Samples Plot graph.

If you want to see just the outliers in the study table, open the Show Selected Molecules control panel by selecting Molecules/Show Selected... from the Study Table control panel's menu bar. Then check the Show Only Selected Rows in Study Table check box.

Only those four rows now appear.

Click the Show Selected Molecules in Models Window action button in the Show Selected Molecules control panel.

The corresponding models are brought back from the original SD file.

7.   Working with outlier compounds

The Property Space Outliers control panel provides several options for working with outliers. They may be deleted from the study table or simply removed from the observations.

Selecting diverse compounds

Cluster-based selections

1.   Start a new Cerius2 session

If you have just finished the previous section, start a new session by selecting the File/New Session item from the Visualizer menu bar. Click Confirm in the Re-initialize All message window. You may also start a brand new Cerius2 session.

2.   Open an empty study table

Go to the LIBRARY ANALYSIS card in the COMBI-CHEM I deck and select the Show Study Table item.

3.   Recall a table of aromatic amine reagents

Open the Open Study Table control panel by selecting the File/Open... item on the menu bar in the Study Table control panel. Select the table ./Cerius2-Resources/COMBICHEM/amines/amines_arom.tbl and click the OPEN pushbutton.

From this list of molecules, our goal is to group those reagents into families and then select a representative member from each family. This is the basis for library subsetting using diverse selections of reagents. This method is particularly effective when dealing with large arrays of reagents for which full enumeration of the library would be prohibitive.

For more information on clustering, please consult the Combinatorial Chemistry Methodologies section of this documentation .

4.   Identify the independent variables from the study table

Select the columns labeled PC1, PC2 and PC3 and identify these columns as the independent variable by clicking the Set independent icon on the Study Table control panel's tool bar.

5.   Visualizing compounds in PCA space

While the columns PC1, PC2 and PC3 are still selected, select the 3D Plot icon from the Study Table control panel's tool bar. Click the Set XYZ using Selected Columns action button. In the 3D Plot Samples control panel, assure that the Label Row by popup is set to Star. Do not change the other defaults in the 3D Plot Samples control panel. Then click the 3D Plot pushbutton. Close the 3D Plot Samples control panel.

A representation of the compounds in PCA space appear in the Cerius2 Models window.

6.   Setting up cluster analysis

Select CLUSTER from the popup to the right of the RUN pushbutton in the Study Table control panel.

Then open the Statistical Method Preferences control panel by selecting the Preferences/Statistical Method... item from the menu bar in the Study Table control panel. Set Statistical Method to CLUSTER and Method to HCA/Complete Linkage.

7.   Running cluster analysis

Click the RUN pushbutton on the Study Table control panel.

Two graphs appear in the Cerius2 Graphs window: The graph on the left(Dendogram) represents the families of compounds where each branch represents a family. Going down the dendogram, each family may be filiated to its member compounds. The graph on the right (Objective function) may be used to identify the natural break points (regions with large slope) in the dataset.

8.   Subsetting the reagent list

In the Statistical Method Preferences panel, enter 30 as the number of clusters and press the ccbutton.

The Dendogram now reflects the identification of 30 clusters. A cross section of the Dendogram is obtained so as to intersect 30 branches representing 30 clusters. The representation of the compounds in PCA space is also updated so that clusters are color coded.

One compound is then selected from each cluster as close as possible to each cluster centroid. The white vertical strips at the bottom of the Dendogram represents the filiation of each cluster to its representative member. The corresponding rows are also selected in the Study Tables.

You could now decide to use this selection of reagents to construct a combinatorial library rather than the complete list. This concludes our example of reagent-based library subsetting.

Distance-based selections

1.   Start a new Cerius2 session

If you have just finished the previous section, start a new session by selecting the File/New Session item from the Visualizer menu bar. Click Confirm in the Re-initialize All message window. You may also start a brand new Cerius2 session.

2.   Open an empty study table

Go to the LIBRARY ANALYSIS card in the COMBI-CHEM I deck and select the Show Study Table item.

3.   Recall the benzodiazepine library

Open the Open Study Table control panel by selecting the File/Open... item on the menu bar in the Study Table control panel. Select the table benzo.tbl and click the OPEN pushbutton. Alternatively, if you did not complete the earlier part of the tutorial, select the table ./Cerius2-Resources/COMBICHEM/demos/benzo_720.tbl and click the OPEN pushbutton.

4.   Perform a PCA analysis

Run a simple PCA analysis as explained in a previous example, Using principal component analysis. Remember to set Label Row by to Star in the 3D Plot Samples control panel before starting the analysis.

If you forgot to set the plot options, you can re-plot the compounds using the same 3D Plot tool.

5.   Select diverse compounds

Go to the LIBRARY ANALYSIS card and select the Select Molecules Diverse Distance-based menu item to open the Select Diverse control panel. Change the number of Molecules using... to 100.

This selects 100 compounds using the MaxMin (Maximum Dissimilarity) metric. For detailed information on diversity metrics (diversity target functions), please refer to the Theory section.

6.   Setting preferences

Open the Analysis Preferences control panel by clicking the Preferences... pushbutton on the bottom left of the Select Diverse control panel. Check the Plot Diversity vs Monte Carlo Step check box and leave the other defaults unchanged. You can then close the Analysis Preferences control panel.

7.   Perform the diversity selection

Click the SELECT pushbutton in the Select Diverse control panel.

This starts the diversity selection using a Monte Carlo optimization. Models are selected and rejected so as to optimize the diversity function. Please refer to the Theory section for more details.

The selected models are highlighted in the study table and appear in red in the Samples Plot graph.

Note

There is no combinatorial constraint in this procedure, the resulting selection of products does not map to an array or reagents.

You can now close the Select Diverse control panel.

Cell-based selections

8.   Continuing from the previous example

9.   Select diverse compounds

Go to the LIBRARY ANALYSIS card and select the Select Molecules Diverse Cell-based item. This opens the Cell-Based Selection control panel. Set Number of Molecules per Occupied Cell to 250. Check Plot Cells in 3D Space.

These settings bin the space over a 6x6x6 grid (216 cells).

Click the Bin Space pushbutton in the Cell-Based Selection control panel.

This starts the diversity selection using the cell-based technique. The selection procedure selects one compound per occupied cell. In this example, you obtain a selection of 95 models.

The models obtained are selected in the study table and are colored red in the Samples Plot graph.

Note

As a general rule, you should try to adjust the binning of descriptor space so that the number of occupied cells comes close to the number of desired models.

Close the Cell-Based Selection control panel.

Selecting similar compounds

10.   Continuing from the previous example

11.   The lead follow-up procedure

It is sometimes useful to find which models in a dataset are similar to a given model. For example, one model that is part of the screening set is identified as a hit. You may want to look for similar compounds in the complete library to investigate potentially more active compounds around that lead.

12.   re-plot the samples in PCA space

From the Study Table control panel's tool bar, select the 3D Plot icon and use the 3D Plot Samples control panel to set X, Y, and Z to PC1, PC2, and PC3, respectively. Set the Label Row by popup to Pickable. Do not change the other defaults in the 3D Plot Samples control panel. Then click the 3D Plot pushbutton.

The 3D graph appears in the Cerius2 Models window.

13.   Select a compound in property space

You can select a compound in the 3D property space by simply clicking a point in the graph. You then see the corresponding row highlighted in the study table. The pickable cross is also highlighted, and the name of the compound appears in the Cerius2 text window.

14.   Select similar compounds

Go to the LIBRARY ANALYSIS card and select the Select Molecules Similar item to open the Select Similar control panel. Change molecules similar to to 218 and enter 20 next to Number of molecules.

Note

The panel also indicates Getting data from Study Table which is the standard mode of operation. We will see later how to use binary data files to handle large libraries.

These settings define compound number 218 as the reference model for which you want to find 20 similar models.

Click the SELECT pushbutton in the Select Similar control panel.

This starts the similarity selection using a distance-based technique. This procedure allows you to perform similarity searches in any property space.

The selected compounds are selected in the study table and appear in red in the Samples Plot graph. The distance of each model from the reference model (compound 218) is also listed in the text window.



MSI Product Previous Next Contents Index Top

Last updated May 19, 2000 at 01:50PM Pacific Daylight Time.
Copyright © 2000, Molecular Simulations Inc. All rights reserved.