| Combi-Chem |

Visualizing compounds in descriptor space
6. Start a new Cerius2 session
| Go to the LIBRARY ANALYSIS card located in the COMBI-CHEM I card deck and select the Show Study Table menu item. |
8. Recall the benzodiazepine library
9. Select descriptor columns from the study table
| Select the columns MW, Rotlbonds, and AlogP. |
These columns are now highlighted.
The 3D graph appears in the Cerius2 Models window.
| You can select compounds in the 3D property space by simply dragging a rectangle around a group of points in the graph using the left mouse button. |
The corresponding rows become highlighted in the study table.
The system goes back to the original SD file and restores the models.
Using principal component analysis
12. Continue from the previous example
If you have just finished the previous section, you have the table of benzodiazepines open with the 50 default descriptors defined as independent variables.
| Select PCA from the popup to the right of the RUN pushbutton in the Study Table control panel. |
| Scroll the current table to the right to view these new columns. |
A printout appears in the text window (you may have to scroll up to see it) which lists, among other things, the percentage of variance explained (SSVAR Explained) by the components. Here, the three components combined explain over 90% of the variance of the original dataset. That is, more than 90% of the information in the 50 descriptors is captured by the first three principal components.
| Display the plots by clicking the diamond to the left of the model name in the Model Manager. |
In these plots, samples or descriptors that overlap or are close may be redundant and less interesting than samples or descriptors that are well separated from others. This can be a useful tool for discovering (and possibly removing) redundant models or descriptors.
Using multi-dimensional scaling
1. Start a new Cerius2 session
| Go to the LIBRARY ANALYSIS card in the COMBI-CHEM I deck and select the Show Study Table item. |
3. Recall the benzodiazepine library
4. Identify the fingerprint column from the study table
Fingerprint descriptors cannot be easily visualized, because of their high-dimensional nature. Multidimensional scaling (MDS) provides a means of visualizing compounds in an arbitrary Cartesian space where inter-compound distances are reproduced to the best possible accuracy. For more information on MDS, please consult the Combinatorial Chemistry Methodologies and Theory sections of this documentation.
| Select MDS from the popup to the right of the RUN pushbutton in the Study Table control panel. |
| Scroll the current table to the right to view these new columns. |
A printout appears in the text window which tells, among other things, the percentage of variance explained (SSVAR Explained) by the new Euclidian distance space. Here, the 3 dimensions combined explain 78% of the variance of the original data set. That is, 78% of the inter-compound distance information contained in the fingerprints is captured by the first three dimensions.
3D Pharmacophore Fingerprinting (C2·3DKeys)
A 3D fingerprint for a molecule is defined as the collection of all combinations of three potential pharfmacophore features (triplets) or four feature (quartets) in 3D space for all conformers. Each triplet/quartet is characterized by the three/four feature types and the three/four corresponding inter-feature distances. This tutorial will show you how to generate 3D fingerprints, examine the pharmacophores identified for each structure, and use the pharmacophores for similarity searching and library comparison.
|
For this tutorial to work correctly you must have the Catalyst 450 environment set up in the shell running Cerius2 and the Catalyst database server must be running.
|
First a conformer database must be built from the library structures. This can be done From the command line or from From within Cerius2.
At the Hosts command you can list all the CPUs you wish to use on the network. The command Hosts="localhost(2), machine1" would indicate that two CPUs should be used on the local machine and one on machine1.
The second stage is to identify the surface accessible features.
Note that it is possible to generate the features file from the command line.
In this step a binary 3D fingerprint file is obtained from the feature file.
Note that to create a 4 point pharmacophore check the Create 4 feature pharmacophores box prior to generating the fingerprints
5. Similarity searching (lead follow-up) using 3D fingerprints
This will select the 20 most similar molecules to molecule 1 in terms of their 3 point pharmacophore patterns.
| Check both the Compare Individual Molecules to Reference Library and the Create Histogram of New and Common Pharmacophores checkboxes and push the Compare Libraries button. |
The libraries are compared by summing the number of different pharmacophores present in the candidate and reference library.
A report in the text port shows the 10 molecules that should be added to the reference library to add the most new pharmacophore triangles not already present in the reference library; these rows are highlighted in the study table.
A new diversity and similarity metric has been implemented based on generating a "modal fingerprint" for a set of N molecules, in which a bit will be "On" if it is present in at least one of the molecules in the set. It will work for both 2D and 3D fingerprints.
| Go to the LIBRARY ANALYSIS card located on the COMBICHEM-I card deck and select the Show Study Table menu item. |
This will select a library which is design to maximize the number of on bits in the modal fingerprint of the selected library. The optimum value reported in the text box shows the fraction of bits set in the modal fingerprint of the subset library over the number of bits set in the virtual library at the conclusion of the optimization.
Fingerprinting and clustering (C2·LibEngine)
C2·LibEngine uses a Markush representation of a library to rapidly produce Lipinski properties, fingerprints and, optionally, runs clustering on large combinatorial libraries. This is done without having to enumerate the library structures, but produces exact property values and fingerprints which include all the fragments that would cross R-group boundaries. Input is an MDL RG file, which may be exported from the analog builder (see the C2·Analog tutorial) or obtained from other software such as MDLs Project Library or Central Library. The software can optionally enumerate a library to smiles, but this is not required to calculate the properties and/or fingerprints.
| Go to the LIB ENGINE card located on the COMBICHEM-II card deck and select the Setup and Run menu item. |
| Go to the BINARY DATA FILES card located on the COMBICHEM-I card deck and click Select BDF. |
| Click lib3.bdf to see the contents of the file and select the item BCI_FP bci1052.dic in the list box on the lower left. |
These fingerprints can now be used in any similarity/diversity selection method. In this case a combinatorial selection will be made using the new Fingerprint On-bits metric.
| Go to the LIBRARY ANALYSIS card located on the COMBICHEM-I card and select Restraints-->Property Ranges. |
| Click the Get Properties button, and select SlogP. Set the Upper Bound to 5 and click ADD. Select MW, set the Upper Bound to 500 and click ADD. |
| Click LIST to see if the property has been added as you just specified. |
| On the LIBRARY ANALYSIS card, select Rgroup Subsetting --> Diverse Library. |
This will select a subset of 2,4,8 reporting the reagents in the text port and creating a BDF file of the 64 selected products.
| Click lib3_RgDiv.bdf to see the contents of the file and select the mol_wt and SlogP items (Control-Click to make multiple selections). |
| Go up to the DRUG DISCOVERY card deck, select the QSAR card and click Preferences-->Histograms.... |
| Click Create Histogram and examine the distributions of molecular weight and logP in the subset that was selected. |
2. Generating a cluster file and using it for combinatorial subsetting.
In this section, relocation clustering is run directly from the Markush structure, with fingerprints generated on the fly.
| Go up to the COMBICHEM-II card deck, select the Lib Engine card and click Setup and Run. |
This will generate the fingerprints for the 5760 members of the library and then run k-means relocation clustering to produce 64 clusters.
| Go to the LIBRARY ANALYSIS card located on the COMBICHEM-I card deck and select Rgroup Subsetting --> Diverse Library. |
This will select a subset of 2,4,8 reporting the reagents in the text port and creating a BDF file of the 64 selected products.