| Combi-Chem |

Identification of outliers
1. Start a new Cerius2 session
| Go to the LIBRARY ANALYSIS card in the COMBI-CHEM I deck and select the Show Study Table item. |
3. Recall the benzodiazepine library
| Run a simple PCA analysis as explained in the previous example, Using principal component analysis. Remember to set Label Row by to Star in the 3D Plot Samples control panel before starting the analysis. |
If you forgot to set the plot options, you can re-plot the compounds using the same 3D Plot tool.
| Go to the LIBRARY ANALYSIS card and select the Property Space Outliers item to open the Property Space Outliers control panel. |
| Click the Identify Outliers in Property Space action button on the Property Space Outliers control panel . |
6. Visualize outlier compounds
Only those four rows now appear.
| Click the Show Selected Molecules in Models Window action button in the Show Selected Molecules control panel. |
The corresponding models are brought back from the original SD file.
The Property Space Outliers control panel provides several options for working with outliers. They may be deleted from the study table or simply removed from the observations.
Selecting diverse compounds
Cluster-based selections
1. Start a new Cerius2 session
| Go to the LIBRARY ANALYSIS card in the COMBI-CHEM I deck and select the Show Study Table item. |
3. Recall a table of aromatic amine reagents
From this list of molecules, our goal is to group those reagents into families and then select a representative member from each family. This is the basis for library subsetting using diverse selections of reagents. This method is particularly effective when dealing with large arrays of reagents for which full enumeration of the library would be prohibitive.
| Select the columns labeled PC1, PC2 and PC3 and identify these columns as the independent variable by clicking the Set independent icon on the Study Table control panel's tool bar. |
5. Visualizing compounds in PCA space
A representation of the compounds in PCA space appear in the Cerius2 Models window.
| Select CLUSTER from the popup to the right of the RUN pushbutton in the Study Table control panel. |
| Click the RUN pushbutton on the Study Table control panel. |
Two graphs appear in the Cerius2 Graphs window: The graph on the left(Dendogram) represents the families of compounds where each branch represents a family. Going down the dendogram, each family may be filiated to its member compounds. The graph on the right (Objective function) may be used to identify the natural break points (regions with large slope) in the dataset.
| In the Statistical Method Preferences panel, enter 30 as the number of clusters and press the ccbutton. |
The Dendogram now reflects the identification of 30 clusters. A cross section of the Dendogram is obtained so as to intersect 30 branches representing 30 clusters. The representation of the compounds in PCA space is also updated so that clusters are color coded.
Distance-based selections
1. Start a new Cerius2 session
| Go to the LIBRARY ANALYSIS card in the COMBI-CHEM I deck and select the Show Study Table item. |
3. Recall the benzodiazepine library
| Run a simple PCA analysis as explained in a previous example, Using principal component analysis. Remember to set Label Row by to Star in the 3D Plot Samples control panel before starting the analysis. |
If you forgot to set the plot options, you can re-plot the compounds using the same 3D Plot tool.
Go to the LIBRARY ANALYSIS card and select the Select Molecules Diverse Distance-based menu item to open the Select Diverse control panel. Change the number of Molecules using... to 100.
|
This selects 100 compounds using the MaxMin (Maximum Dissimilarity) metric. For detailed information on diversity metrics (diversity target functions), please refer to the Theory section.
7. Perform the diversity selection
| Click the SELECT pushbutton in the Select Diverse control panel. |
This starts the diversity selection using a Monte Carlo optimization. Models are selected and rejected so as to optimize the diversity function. Please refer to the Theory section for more details.
|
There is no combinatorial constraint in this procedure, the resulting selection of products does not map to an array or reagents. You can now close the Select Diverse control panel.
|
Cell-based selections
8. Continuing from the previous example
These settings bin the space over a 6x6x6 grid (216 cells).
| Click the Bin Space pushbutton in the Cell-Based Selection control panel. |
This starts the diversity selection using the cell-based technique. The selection procedure selects one compound per occupied cell. In this example, you obtain a selection of 95 models.
|
As a general rule, you should try to adjust the binning of descriptor space so that the number of occupied cells comes close to the number of desired models.
|
| Close the Cell-Based Selection control panel. |
Selecting similar compounds
10. Continuing from the previous example
11. The lead follow-up procedure
It is sometimes useful to find which models in a dataset are similar to a given model. For example, one model that is part of the screening set is identified as a hit. You may want to look for similar compounds in the complete library to investigate potentially more active compounds around that lead.
The 3D graph appears in the Cerius2 Models window.
You can select a compound in the 3D property space by simply clicking a point in the graph. You then see the corresponding row highlighted in the study table. The pickable cross is also highlighted, and the name of the compound appears in the Cerius2 text window.
Go to the LIBRARY ANALYSIS card and select the Select Molecules Similar item to open the Select Similar control panel. Change molecules similar to to 218 and enter 20 next to Number of molecules.
|
|
The panel also indicates Getting data from Study Table which is the standard mode of operation. We will see later how to use binary data files to handle large libraries.
|
| Click the SELECT pushbutton in the Select Similar control panel. |
This starts the similarity selection using a distance-based technique. This procedure allows you to perform similarity searches in any property space.