MSI Product Previous Next Contents Index Top
Combi-Chem


3h. Importing and working with BDF files

The binary data file system in Cerius2 provides a means of working with large libraries without having to hold the libraries or their descriptors in memory. This makes it possible to work with libraries of hundreds of thousands to millions of structures. The binary file format is designed for high-speed random access, and methods operating on these files often run as fast as, if not faster than, equivalent operations run on the study table.

Two methods are available for calculating descriptors as of Cerius2 release 4.5: the regular method available in previous versions and a fast descriptor engine that works with a limited range of descriptors. This tutorial demonstrates both methods.

1.   Creating a binary data file using regular descriptors

Bdf files can be created from existing data in tables or ASCII files or can be created as descriptors are calculated, as demonstrated in this example.

Start Cerius2 or start a new Cerius2 session by selecting the File/New Session item from the Visualizer menu bar. (Click Confirm in the Re-initialize All message box if you start a new session.) On the LIBRARY ANALYSIS card of the COMBI-CHEM I deck, select Show Study Table. Choose the default set of combinatorial chemistry descriptors by selecting Preferences/Defaults Set/COMBICHEM from the menu bar in the Study Table control panel. Add these descriptors to the empty study table by selecting Descriptors/Add Default on the menu bar in the Study Table control panel. Select Molecules/From SD File from the menu bar in the Study Table control panel to open the Add Molecules from SD File control panel. SELECT the file Cerius2-Resources/COMBICHEM/demos/dipep_400.sd. Click the Preferences button in the Add Molecules from SD File control panel. Set the SD File Preferences control panel by checking the following checkboxes and entering the bdf filename as dipep_400.bdf:
Delete Model After Adding,
Delete Row After Adding,
Output Row to BDF File.
The Add SD File Name, Type and Index to Table checkbox should already be checked.
(Append Model and Output Row to DAT file should be unchecked.)
Close this control panel and click IMPORT MOLECULES on the Add Molecules from SD File control panel.

For larger files, these options are crucial to conserving memory by removing the model and rows and outputting them directly to a bdf file, rather than attempting to load all structures into memory at once.

2.   Creating a bdf file using fast descriptors

If the descriptors that you want to calculate are implemented in the fast descriptor module, then this method is preferred, since it is approximately two orders of magnitude faster.

On the BINARY DATA FILES card of the COMBI-CHEM I deck select Fast Descriptors. Click the Select Molecules action button in the Fast Descriptors control panel and, in the Select SD file for Fast Descriptors control panel, SELECT the file Cerius2-Resources/COMBICHEM/demos/dipep_400.sd, then close the file-selection control panel. Next click the Select Descriptors action button in the Fast Descriptors control panel. In the Select Fast Descriptors control panel, select all 5 descriptors in the table on the left side and click the right-facing arrow to place them in the descriptors listbox on the right. Close the descriptor-selection control panel. Enter example.bdf in the Output to BDF file entry box on the Fast Descriptors control panel and click the CALCULATE DESCRIPTORS pushbutton.

3.   Working with bdf files--principal components analysis

You can now run calculations, such as principal components analysis, on the bdf file.

From the BINARY DATA FILES card on the COMBI-CHEM I deck choose Select BDF. The Binary Data File control panel shows the list of descriptors that are available. SELECT the mol_data.bdf file that was created automatically in Step 1. Click the Select All action button to select all the descriptors and then click the RUN pushbutton to start the PCA calculation.

This creates a .dep file containing the PCA equations. The actual values of the PCs are not stored, since it is more efficient to calculate them on demand from the equations.

Select the mol_data_pca.dep file by clicking the Browse button next to the Dependent Properties from entry box on the Binary Data File control panel. Then click the Dependent Properties from action button.

This causes three new variables (PC1, PC2, and PC3) to appear in the descriptors listbox in the Binary Data File control panel.

Select only PC1, PC2, and PC3 in the listbox of the Binary Data File control panel and mark them as X variables by clicking the X tool in that control panel.

4.   Plotting from a bdf file

You can plot directly from a bdf file without loading the data into a study table.

On the BINARY DATA FILES card in the COMBI-CHEM I deck select 3D Plot from BDF. In the 3Dplot from BDF control panel, change the Label Row by popup to Pickable. Click the CREATE 3DPLOT pushbutton and examine the plot in the model window.

5.   Selecting from a bdf file

Although our example uses only 400 molecules, you can use the bdf file system to derive subsets of very large libraries via a cell-based procedure.

With the three principal components PC1, PC2, and PC3 still selected in the list in the Binary Data File control panel, go to the LIBRARY ANALYSIS card and select the Select Molecules/Diverse/Distance-based menu item. In the Select Diverse control panel, change the number of molecules to select to 40. Check the Create New BDF File with Selected Rows checkbox, change the filename to dipep_400_selected.bdf, and click the SELECT button.

This file can now be used to examine the selections, and the selected points are shown in the model window.

6.   Library comparison using bdf files

There are two ways of comparing libraries:

Using a reference chemical space derived from a single set of molecules (a drug database or set of actives from a screen, for example) and projecting a candidate library into it.

Deriving a common chemical space from two or more libraries.

Either way, the two separate libraries are held in two separate bdf files, and principal components are calculated and applied across both files.

a.   Using a reference chemical space

This exercise projects a set of isozaxoles into the space defined by the benzodiazipines.

From the BINARY DATA FILES card in the COMBI-CHEM I deck choose Select BDF. SELECT Cerius2-Resources/COMBICHEM/demos/benzo_720.bdf. Cick Select All and then RUN to compute the principal components. Select the file benzo_720_pca.dep by clicking the Browse button next to the Dependent Properties from entry box. Then click the Dependent Properties from action button

This causes three new variables (PC1, PC2, and PC3) to appear in the descriptors listbox.

Select only PC1, PC2, and PC3 in the listbox of the Binary Data File control panel and mark them as X variables by clicking the X tool in that control panel. On the BINARY DATA FILES card in the COMBI-CHEM I deck select 3D Plot from BDF. Change the Model name to benzo. Click the CREATE 3DPLOT button and examine the plot in the model window.

This set of PCA equations is now applied to the set of isoxozoles.

From the BINARY DATA FILES card in the COMBI-CHEM I deck choose Select BDF. SELECT Cerius2-Resources/COMBICHEM/demos/isox_625.bdf. Select the file benzo_720_pca.dep by clicking the Browse button next to the Dependent Properties from entry box. Then click the Dependent Properties from button.

This causes three new variables (PC1, PC2, and PC3) to appear in the descriptors column.

Using a different color and giving the plot a different name, plot these PCs as described above. Overlay the plots by selecting the Overlay icon on the Visualizer panel and by marking both plots Visible.

b.   Using a common chemical space

In this exercise principal components are derived over multiple libraries contained in multiple bdf files.`

From the BINARY DATA FILES card in the COMBI-CHEM I deck choose Select BDF. SELECT Cerius2-Resources/COMBICHEM/demos/benzo_720.bdf. Click the Select All button. Select the BDF Preferences pushbutton. Use the With PCA file browser in the BDF Preferences control panel to add the Cerius2-Resources/COMBICHEM/demos/isox_625.bdf file to this list. Assure that both files are selected in the list box on the right and then click the RUN button in the Binary Data File control panel to run the PCA calculation.

This produces a file named benzo_720_isox_625.dep, which contains PCA equations calculated using the data in both files.

The libraries can be plotted using the procedure in Step a above to create two overlayed plots. Use the benzo_720_isox_625.dep file to produce the PC1, PC2, and PC3 columns for both files.

More than two libraries can be included in this list.

7.   Comparing libraries

Once two libraries have been projected into a common frame of reference using method a or b above, they can be compared using the techniques in the library comparison module. This example shows the selection of a subset of isoxozoles to complement the benzodiazipines.

From the COMBI-CHEM I card deck select the LIBRARY COMPARISON card. Select Complement Library. In the Complement Library control panel, set the Libraries from popup to BDF Files. Select Cerius2-Resources/COMBICHEM/demos/isox_625.bdf as the Source Library and Cerius2-Resources/COMBICHEM/demos/benzo_720.bdf as the Target Library. Click the Select Common Descriptors pushbutton. In the Common Descriptors control panel, set the Get Dependent Properties from entry box to benzo_720_isox_625.dep. Select the PC1, PC2, PC3 descriptors that appear in the listbox in the Common Descriptors control panel. Finally, click the SELECT pushbutton on the Complement Library control panel.

If you want to see graphical results of the selection, you need to have plotted the libraries as mentioned above.

The selected molecules are saved in a new bdf file.



MSI Product Previous Next Contents Index Top

Last updated May 19, 2000 at 01:51PM Pacific Daylight Time.
Copyright © 2000, Molecular Simulations Inc. All rights reserved.