MSI Product Previous Next Contents Index Top
Combi-Chem


3c. Library analysis and visualization

Back to index of Tutorials.

Visualizing compounds in descriptor space

6.   Start a new Cerius2 session

If you have just finished the previous section, start a new session by selecting File/New Session from the main Visualizer menu bar. Click Confirm in the Re-initialize All message window. You can also start from a brand new Cerius2 session.

7.   Open an empty study table

Go to the LIBRARY ANALYSIS card located in the COMBI-CHEM I card deck and select the Show Study Table menu item.

8.   Recall the benzodiazepine library

Open the Open Study Table control panel by selecting the File/Open... item from the menu bar of the Study Table control panel. Select the table benzo.tbl and click the OPEN pushbutton. Alternatively, if you did not complete the earlier part of the tutorial, select the table ./Cerius2-Resources/COMBICHEM/demos/benzo_720.tbl and click the OPEN pushbutton.

9.   Select descriptor columns from the study table

Select the columns MW, Rotlbonds, and AlogP.

These columns are now highlighted.

10.   Display the compounds in property space

From the tool bar (below the menu bar) in the Study Table control panel, click the 3D Plot icon. In the 3D Plot Samples control panel that appears, click the Set XYZ using Selected Columns action button. Set the Label Row by popup to Pickable. Do not change the other defaults in the 3D Plot Samples control panel. Then click the 3D Plot pushbutton.

The 3D graph appears in the Cerius2 Models window.

11.   Select compounds in property space

You can select compounds in the 3D property space by simply dragging a rectangle around a group of points in the graph using the left mouse button.

The corresponding rows become highlighted in the study table.

If you want to see these models in the Models window, simply select the Molecules/Show Selected... item from the menu bar in the Study Table control panel. Then click the Show Selected Molecules in Models Window action button in the Show Selected Molecules control panel.

The system goes back to the original SD file and restores the models.

Using principal component analysis

12.   Continue from the previous example

If you have just finished the previous section, you have the table of benzodiazepines open with the 50 default descriptors defined as independent variables.

Principal components analysis (PCA) is a data reduction technique commonly used to relieve redundancy among possibly correlated variables. PCA attempts to discover the true dimensionality of the problem by using linear combinations of the original variables that are orthogonal to each other. PCA allows you to visualize most of the variance of the dataset by visualizing the first three principal components.

For more information on PCA, please consult the Combinatorial Chemistry Methodologies and Theory sections of this documentation set.

13.   Setting up principal component analysis

Select PCA from the popup to the right of the RUN pushbutton in the Study Table control panel.

Then open the Statistical Method Preferences control panel by selecting the Preferences/Statistical Method... item from the menu bar in the Study Table control panel. Make sure Number of Components is equal to 3.

On the tool bar in the Study Table control panel, click the 3D Plot icon to open the 3D Plot Samples control panel (if you closed it earlier). Set the Label Row by popup to Star. Close the 3D Plot Samples control panel without changing the other defaults. Click the RUN pushbutton in the Study Table control panel.

Three new columns (PC1, PC2, and PC3) are added to the right side of the table. The new columns are considered the new independent variables (X#) in the study table.

Scroll the current table to the right to view these new columns.

A printout appears in the text window (you may have to scroll up to see it) which lists, among other things, the percentage of variance explained (SSVAR Explained) by the components. Here, the three components combined explain over 90% of the variance of the original dataset. That is, more than 90% of the information in the 50 descriptors is captured by the first three principal components.

PCA creates two new models, which are actually plots rather than molecular models, and which show views of your data in both sample and descriptor space: PCA Descriptor Plot and PCA Samples Plot. By making these models current with the Model Manager, you can see the proximity of the samples or descriptors to one another.

Display the plots by clicking the diamond to the left of the model name in the Model Manager.

In these plots, samples or descriptors that overlap or are close may be redundant and less interesting than samples or descriptors that are well separated from others. This can be a useful tool for discovering (and possibly removing) redundant models or descriptors.

Using multi-dimensional scaling

1.   Start a new Cerius2 session

If you have just finished the previous section, start a new session by selecting File/New Session from the main Visualizer menu bar. Click Confirm in the Re-initialize All message window. You may also start a brand new Cerius2 session.

2.   Open an empty study table

Go to the LIBRARY ANALYSIS card in the COMBI-CHEM I deck and select the Show Study Table item.

3.   Recall the benzodiazepine library

Open the Open Study Table control panel by selecting the File/Open... item from the menu bar in the Study Table control panel. Select the table ./Cerius2-Resources/COMBICHEM/demos/benzo_720.tbl and click the OPEN pushbutton.

4.   Identify the fingerprint column from the study table

Clear the original definition of the independent column by clicking the Clear indep/dep icon on the Study Table control panel's tool bar. Select the column labeled ISIS_key, which contains a set of hexadecimal characters. Identify this column as the independent variable by clicking the Set independent icon on the Study Table control panel's tool bar.

Fingerprint descriptors cannot be easily visualized, because of their high-dimensional nature. Multidimensional scaling (MDS) provides a means of visualizing compounds in an arbitrary Cartesian space where inter-compound distances are reproduced to the best possible accuracy. For more information on MDS, please consult the Combinatorial Chemistry Methodologies and Theory sections of this documentation.

5.   Setting up multidimensional scaling

Select MDS from the popup to the right of the RUN pushbutton in the Study Table control panel.

Then open the Statistical Method Preferences control panel by selecting the Preferences/Statistical Method... item from the menu bar in the Study Table control panel and change Statistical Method to MDS.

From the Study Table control panel's tool bar, select the 3D Plot icon. In the 3D Plot Samples control panel, assure that the Label Row by popup is set to Star. Close the 3D Plot Samples control panel without changing the other defaults. Click the RUN pushbutton on the Study Table control panel.

Three new columns (MDS1, MDS2, and MDS3) are created and added to the end of the table. The independent variable (X) remains set to the fingerprint column.

Scroll the current table to the right to view these new columns.

A printout appears in the text window which tells, among other things, the percentage of variance explained (SSVAR Explained) by the new Euclidian distance space. Here, the 3 dimensions combined explain 78% of the variance of the original data set. That is, 78% of the inter-compound distance information contained in the fingerprints is captured by the first three dimensions.

This analysis creates a new graph in the Cerius2 Models window, which represents the molecular models in the new MDS space. By making this model current with the Model Manager, you can see the proximity of the samples to one another.

3D Pharmacophore Fingerprinting (C2·3DKeys)

A 3D fingerprint for a molecule is defined as the collection of all combinations of three potential pharfmacophore features (triplets) or four feature (quartets) in 3D space for all conformers. Each triplet/quartet is characterized by the three/four feature types and the three/four corresponding inter-feature distances. This tutorial will show you how to generate 3D fingerprints, examine the pharmacophores identified for each structure, and use the pharmacophores for similarity searching and library comparison.

Note

For this tutorial to work correctly you must have the Catalyst 450 environment set up in the shell running Cerius2 and the Catalyst database server must be running.

1.   Create a 3D Database

First a conformer database must be built from the library structures. This can be done From the command line or from From within Cerius2.

From the command line

Make sure that you have sourced the script cshrc in your local catalyst installation to set up the Catalyst 450 environment. Then from the command line in which the catalyst environment has been set up type:

>	catDB CONFIG benz_analogs.bdb

>	catDB SD Cerius2-Resources/COMBICHEM/ccI/benz_analogs.sd benz_analogs.bdb MaxConfs=10 Hosts=localhost

At the Hosts command you can list all the CPUs you wish to use on the network. The command Hosts="localhost(2), machine1" would indicate that two CPUs should be used on the local machine and one on machine1.

From within Cerius2

Go to the STR BASED FOCUSSING card located on the STR BASED DESIGN card deck and select the Library Focussing menu item. Select the Preferences button next to Generate 3D Virtual Library. Select the browse button and select the file: Cerius2-Resources/COMBICHEM/demos/benz_analogs.sd. Set the output database to benz_analogs.bdb. Close the dialog box and push the Generate 3D Virtual Library button.

2.   Create the Features file.

The second stage is to identify the surface accessible features.

Go to the LIBRARY ANALYSIS card located on the COMBICHEM-I card deck and select the Show Study Table menu item. Select the Descriptors menu in the Study Table and choose the 3D Fingerprints... option. Push the Create Features File button to bring up the Create Features File control panel. First select the features to be included in the fingerprints. Select the benz_analogs.bdb file created in step 1. Push the Create Features File button.

Note that it is possible to generate the features file from the command line.

3.   Create the Fingerprint file

In this step a binary 3D fingerprint file is obtained from the feature file.

If the 3D Fingerprints panel has been closed, reopen it by following the instructions above. Push the Create 3D Fingerprint File button to bring up Create Binary Fingerpring File control panel. First select the features file benz_analogs.fea. Push the CREATE 3DFINGERPRINTS FILE button at the bottom of the panel. This will create a 3 point pharmacophore fingerprint for each structure. Next, load the fingerprints into the study table. Push the Load 3D Fingerprints File to Study Table button on the 3D Fingerprints panel. Using the Browse... button on the ensuing panel, select the benz_analogs.3pf file for Name of 3D fingerprint file and push the LOAD button.

Note that to create a 4 point pharmacophore check the Create 4 feature pharmacophores box prior to generating the fingerprints

4.   Selecting similarity coefficients and displaying pharmacophores

Select row 1 in the study table, select the 3D fingerprint column (shift-click) and press the List Pharmacophores for Selected Rows button on the 3D Fingerprints panel to see a listing in the text port of the pharmacophores present in the selected molecule. To see a pictorial representation of the pharmacophores select the Visualize Pharmacophores button. On the ensuing panel, select Name from the Represent Features By pulldown menu near the bottom. Select a row and column (click on a row; control-click on an fingerprint column) from the study table. Push Browse Pharmacophores for Selected Row and Column and use the arrow controls to move between the pharmacophores shown in the model window.

5.   Similarity searching (lead follow-up) using 3D fingerprints

3D fingerprints may be used for both diversity and similarity selection. This example shows similarity selection

Select the 3DFP3 (3 point pharmacophore) fingerprint column and mark it as the independent variable (X) column. Go back to the LIBRARY ANALYSIS card and select the Select Molecules -> Similar item. Enter 20 as the Number of molecules to select and push the SELECT button.

This will select the 20 most similar molecules to molecule 1 in terms of their 3 point pharmacophore patterns.

A list of distances (based on the similarity coefficient selected in the 3D fingerprints panel) is shown in the text port. The selected molecules are highlighted in the study table.

6.   Library comparison using 3D fingerprints

In this section, two libraries will be compared by using their 3D fingerprints. Libraries are compared by the pharmacophores they have in common.

Go to the LIBRARY COMPARISON card located on the COMBICHEM-I card deck and select the Compare Libraries --> 3D Fingerprints Onbits menu item. On the ensuing control panel, SELECT the benz_analogs.3fp file and push the Select Candidate 3DFP File button. Now, select the file: Cerius2_Resources/COMBICHEM/demos/3dfp/Monopep_RS/monopep_rs.3pf and push the Select Reference 3DFP File button.

Check both the Compare Individual Molecules to Reference Library and the Create Histogram of New and Common Pharmacophores checkboxes and push the Compare Libraries button.

The libraries are compared by summing the number of different pharmacophores present in the candidate and reference library.

Note first the results in the text port. These show the total number of different pharmacophores present in the structures in the two libraries. This shows that there are over 400 pharmacophore triangles that are present in the candidate library that do not occur in any structure in the reference library. It also shows the similarity between the two libraries. This is based on generating a modal (OR) fingerprint for each library and calculating the similarity between this pair of fingerprints.

Next examine the output file LibCompare3DFP.out in a text editor. This compares each individual member of the candidate library to the modal fingerprint of the reference library, showing how many pharmacophores are in each structure that are not found anywhere in the reference. A frequency distribution of this information is shown in the histograms that were generated in the graphs window.

Molecules can be selected from the candidate library to augment the reference library. Remaining in the Compare Libraries control panel, check the Select ... Molecules box and the Highlight Selected Molecules in Study Table box and push Compare Libraries again.

A report in the text port shows the 10 molecules that should be added to the reference library to add the most new pharmacophore triangles not already present in the reference library; these rows are highlighted in the study table.

Note that when comparing two libraries you should ensure that their 3D fingerprints were built using the same options (features and tolerances) so that equivalent bit positions in the two libraries refer to the same pharmacophoric feature.

7.   Using modal fingerprints to optimize diversity and similarity

A new diversity and similarity metric has been implemented based on generating a "modal fingerprint" for a set of N molecules, in which a bit will be "On" if it is present in at least one of the molecules in the set. It will work for both 2D and 3D fingerprints.

For diversity the library will be designed to maximize the total number of on bits in the modal fingerprint of the selected library. For similarity, it will be designed to maximize the number of common bits between the modal fingerprint for a library and a "target" fingerprint.

Go to the LIBRARY ANALYSIS card located on the COMBICHEM-I card deck and select the Show Study Table menu item.

Open the study table Cerius2-Resources/COMBICHEM/demos/benzo_720_3fp.tbl. Select the 3D fingerprint column and mark it as the independent variable (X column). Select the Rgroup Subsetting -> Diverse Library item from the Library Analysis card on the COMBICHEM-I card deck. Select Fingerprint OnBits from the Diversity Metric pulldown menu. Enter 3,3,3,3 in the Number of fragments dialog and push the SELECT RGROUP FRAGMENTS button.

This will select a library which is design to maximize the number of on bits in the modal fingerprint of the selected library. The optimum value reported in the text box shows the fraction of bits set in the modal fingerprint of the subset library over the number of bits set in the virtual library at the conclusion of the optimization.

Fingerprinting and clustering (C2·LibEngine)

C2·LibEngine uses a Markush representation of a library to rapidly produce Lipinski properties, fingerprints and, optionally, runs clustering on large combinatorial libraries. This is done without having to enumerate the library structures, but produces exact property values and fingerprints which include all the fragments that would cross R-group boundaries. Input is an MDL RG file, which may be exported from the analog builder (see the C2·Analog tutorial) or obtained from other software such as MDLs Project Library or Central Library. The software can optionally enumerate a library to smiles, but this is not required to calculate the properties and/or fingerprints.

1.   Generating a fingerprint file and using it for subsetting constrained by molecular properties

A fingerprint and for each individual structure is generated from the Markush structure for the library, and then a distance-based diversity metric is used to design a subset array of the library based on the fingerprint distances. The selection is constrained by desirable ranges of molecular properties.

Go to the LIB ENGINE card located on the COMBICHEM-II card deck and select the Setup and Run menu item.

Under Input file, select the file:
Cerius2-Resources/COMBICHEM/rgfiles/lib3.rg

Under Dictionary file, select the file bci1052.dic. Under Output BDF file, in the textbox, type lib3.bdf and click SELECT. Select the Generate Fingerprints and Generate Structural Descriptors checkboxes. Push the RUN LIB_ENGINE button.

This will generate the fingerprints and molecular properties for the 5,760 members of the library and deposit them in the bdf file.

Go to the BINARY DATA FILES card located on the COMBICHEM-I card deck and click Select BDF.

Click lib3.bdf to see the contents of the file and select the item BCI_FP bci1052.dic in the list box on the lower left.

These fingerprints can now be used in any similarity/diversity selection method. In this case a combinatorial selection will be made using the new Fingerprint On-bits metric.

Prior to running the selection, property rules will be set up to bias the selection.

Go to the LIBRARY ANALYSIS card located on the COMBICHEM-I card and select Restraints-->Property Ranges.

Click the Get Properties button, and select SlogP. Set the Upper Bound to 5 and click ADD. Select MW, set the Upper Bound to 500 and click ADD.

Click LIST to see if the property has been added as you just specified.

On the LIBRARY ANALYSIS card, select Rgroup Subsetting --> Diverse Library.

Type 2,4,8 in the Number of fragments to select in each Rgroup text field. Select Fingerprint OnBits from the Diversity Metric pulldown; change Optimize to Diversity-Penalty; then push SELECT RGROUP FRAGMENTS.

This will select a subset of 2,4,8 reporting the reagents in the text port and creating a BDF file of the 64 selected products.

Finally, examine the property ranges of the subset that has been selected.

Go to the BINARY DATA FILES card located on the COMBICHEM-I card deck and click Select BDF. Make sure to update the directory contents by clicking the action button (square button marked with circle in the upper right of the file browser).

Click lib3_RgDiv.bdf to see the contents of the file and select the mol_wt and SlogP items (Control-Click to make multiple selections).

Go up to the DRUG DISCOVERY card deck, select the QSAR card and click Preferences-->Histograms....

Click Create Histogram and examine the distributions of molecular weight and logP in the subset that was selected.

2.   Generating a cluster file and using it for combinatorial subsetting.

In this section, relocation clustering is run directly from the Markush structure, with fingerprints generated on the fly.

Each structure in the library is assigned a cluster number. These cluster numbers can then be used to pick a subset. For cherry-picking, one item may be selected from each cluster directly. In this case, using cell-based technology for combinatorial subsetting is demonstrated in which clusters are used as cells. This form of selection will be more rapid than the distance-based selection in section 1. and is therefore applicable to much larger libraries.

Go up to the COMBICHEM-II card deck, select the Lib Engine card and click Setup and Run.

Under Input file, select the file:
Cerius2-Resources/COMBICHEM/rgfiles/lib3.rg

Under Dictionary file, select the file bci1052.dic. Under Output BDF file type clus_lib3.bdf. Select the Generate Fingerprints checkbox and turn off (uncheck) the Generate Structural Descriptors checkbox. In the Number of Clusters text box, type 64 and hit return. Push the RUN LIB_ENGINE button.

This will generate the fingerprints for the 5760 members of the library and then run k-means relocation clustering to produce 64 clusters.

Note that the number of clusters used is equivalent to the number of molecules that will ultimately be selected so that a perfectly diverse selection could choose one molecule per cell.

Go up to the COMBICHEM-I card deck, select the Binary Data Files card and click Select BDF. Make sure to update the directory contents by clicking the action button (square button marked with circle in the upper right of the file browser). Click clus_lib3.bdf and select the cluster column.

The cluster numbers can now be used as a basis for selection. In this case, a combinatorial selection will be made.

Go to the LIBRARY ANALYSIS card located on the COMBICHEM-I card deck and select Rgroup Subsetting --> Diverse Library.

Type 2,4,8 in the Number of fragments to select in each Rgroup text field. Make sure Optimize is set to Only Diversity. Set Diversity Metric to Cell-based Fraction Push the Estimate Optimum Number of Cells button, then push SELECT RGROUP FRAGMENTS.

This will select a subset of 2,4,8 reporting the reagents in the text port and creating a BDF file of the 64 selected products.

This concludes the LibEngine tutorial. Note that to enumerate a library, a smiles file name should be given in the SMILES file dialog box of the LibEngine Setup and Run dialog.



MSI Product Previous Next Contents Index Top

Last updated May 19, 2000 at 01:50PM Pacific Daylight Time.
Copyright © 2000, Molecular Simulations Inc. All rights reserved.