
3h. Importing and working with BDF files
The binary data file system in Cerius2 provides a means of working with large libraries without having to hold the libraries or their descriptors in memory. This makes it possible to work with libraries of hundreds of thousands to millions of structures. The binary file format is designed for high-speed random access, and methods operating on these files often run as fast as, if not faster than, equivalent operations run on the study table.
Two methods are available for calculating descriptors as of Cerius2 release 4.5: the regular method available in previous versions and a fast descriptor engine that works with a limited range of descriptors. This tutorial demonstrates both methods.
1. Creating a binary data file using regular descriptors
Bdf files can be created from existing data in tables or ASCII files or can be created as descriptors are calculated, as demonstrated in this example.
For larger files, these options are crucial to conserving memory by removing the model and rows and outputting them directly to a bdf file, rather than attempting to load all structures into memory at once.
2. Creating a bdf file using fast descriptors
If the descriptors that you want to calculate are implemented in the fast descriptor module, then this method is preferred, since it is approximately two orders of magnitude faster.
3. Working with bdf files--principal components analysis
You can now run calculations, such as principal components analysis, on the bdf file.
This creates a .dep file containing the PCA equations. The actual values of the PCs are not stored, since it is more efficient to calculate them on demand from the equations.
This causes three new variables (PC1, PC2, and PC3) to appear in the descriptors listbox in the Binary Data File control panel.
4. Plotting from a bdf file
You can plot directly from a bdf file without loading the data into a study table.
5. Selecting from a bdf file
Although our example uses only 400 molecules, you can use the bdf file system to derive subsets of very large libraries via a cell-based procedure.
This file can now be used to examine the selections, and the selected points are shown in the model window.
6. Library comparison using bdf files
There are two ways of comparing libraries:
Using a reference chemical space derived from a single set of molecules (a drug database or set of actives from a screen, for example) and projecting a candidate library into it.
Deriving a common chemical space from two or more libraries.
Either way, the two separate libraries are held in two separate bdf files, and principal components are calculated and applied across both files.
a. Using a reference chemical space
This exercise projects a set of isozaxoles into the space defined by the benzodiazipines.
This causes three new variables (PC1, PC2, and PC3) to appear in the descriptors listbox.
This set of PCA equations is now applied to the set of isoxozoles.
This causes three new variables (PC1, PC2, and PC3) to appear in the descriptors column.
b. Using a common chemical space
In this exercise principal components are derived over multiple libraries contained in multiple bdf files.`
This produces a file named benzo_720_isox_625.dep, which contains PCA equations calculated using the data in both files.
The libraries can be plotted using the procedure in Step a above to create two overlayed plots. Use the benzo_720_isox_625.dep file to produce the PC1, PC2, and PC3 columns for both files.
More than two libraries can be included in this list.
7. Comparing libraries
Once two libraries have been projected into a common frame of reference using method a or b above, they can be compared using the techniques in the library comparison module. This example shows the selection of a subset of isoxozoles to complement the benzodiazipines.
If you want to see graphical results of the selection, you need to have plotted the libraries as mentioned above.
The selected molecules are saved in a new bdf file.
Last updated May 19, 2000 at 01:51PM Pacific Daylight Time.
Copyright © 2000, Molecular Simulations Inc. All rights
reserved.