| Combi-Chem |

Introduction
The library comparison capabilities found in the C2·LibCompare module allow you to define and compare combinatorial libraries (or any sets of models) in terms of diversity and sampling of property space.
Controls in this panel allow you to:
It is often possible to compare two libraries by qualitative visual inspection of the compounds in property or principal component space. The library comparison functionalities provide a quantitative measurement which should confirm your visual assessments.
Several methods are available for comparing libraries. To access the control panels that give access to this functionality, click the Compare Libraries menu item on the LIBRARY COMPARISON card in the COMBI-CHEM I deck. Then select one of the menu items.
1. Generate a set of random points that sample the property space covered by the two libraries and calculate the distance between each random point and the closest models in library 1 and library 2.
You have control over several options that govern the library comparison process:
where Dij is the distance between model i and random point j, the sum is over all the independent variables (properties) k from 1 to N, Xi,k is the value of property k for model i, and Xj,k is the value of property k for random point j.
The Compare Libraries Similarity control panel gives you a number of options for displaying the data in plot, histogram or table formats as well as an option to select rows in the study table which satisfy certain comparison criteria (i.e., user-specified minimum, average and maximum distances).
When using the option to compare libraries based on counting empty or occupied cells (see below), a cell is considered occupied by reference or by candidate molecules only if the number of molecules is greater than or equal to a user-specified minimum.
The cell-based library comparison method is accessed from the Compare Libraries/Cell Based menu item on the LIBRARY COMPARISON card in the COMBI-CHEM I deck, which opens the Compare Libraries Cell Based control panel.
Cell-based library comparison can be used with data in BDF files or in the QSAR study table. The space occupied by the two libraries can be binned to obtain a specified number of total cells or a specified number of cells occupied by at least one molecule. The optimum binning algorithm, which tries to divide the properties to create cells with sides as similar as possible, is used in both cases.
The comparison metrics can be calculated in two ways (Compare Libraries Based on popup):
| Cell number |
Candidate molecules |
Reference molecules |
|---|---|---|
| 1 | 1 | 8 |
| 2 | 5 | 3 |
| 3 | 2 | 0 |
| 4 | 0 | 6 |
| 5 | 4 | 10 |
| 6 | 2 | 1 |
| 7 | 0 | 9 |
| 8 | 0 | 0 |
| 9 | 1 | 11 |
| 10 | 3 | 0 |
Then the comparison metrics using a) empty or occupied cells with a minimum of 1 molecule per cell to consider it occupied, b) empty or occupied cells with a minimum of 3 molecules per cell to consider it occupied, or c) taking into account the actual number of molecules per cell, are:
You can also use the by Reference Molecules popup to select molecules in the candidate library that occupy cells not occupied by reference molecules (different or new molecules) or select candidate molecules that are in cells already occupied by reference molecules (similar molecules).
Cosine-coefficient diversity and similarity
To set up and apply the cosine-coefficient diversity metric (see Theory), use the Compare Libraries Cosine Coeff Diversity and the Rgroup subsetting Diverse Library control panels. Open the Compare Libraries Cosine Coeff Diversity control panel by selecting Compare Libraries/Cosine Coeff Diversity on the LIBRARY COMPARISON card. Open the Rgroup subsetting Diverse Library control panel by selecting Rgroup Subsetting/Diverse Library on the LIBRARY ANALYSIS card and then set the Diversity Metric popup to Cosine-Coeff Div.
The cosine-coefficient similarity metric is used to compare two libraries, computing the diversity of library A (candidate library), the diversity of library B (reference library), and the change in diversity when library A is added to library B. This metric works with numeric descriptors and with 2D fingerprints.
Fingerprints OnBits metrics
See also 3D Pharmacaphore fingerprints (3DKeys). The fingerprints OnBits metric can be used with both 2D and 3D fingerprints. It is based on generating a "modal fingerprint" for a set of N molecules, in which a bit is on if it is present in at least one molecule in the set.
In the functionality accessed by selecting the Rgroup Subsetting/Diverse Library menu item on the LIBRARY ANALYSIS card and setting the Diversity Metric (in the Rgroup subsetting Diverse Library control panel) to Fingerprint OnBits, libraries are designed to maximize the number of on bits in the modal 2D and/or 3D fingerprint of the sublibrary.
The modal 3D fingerprint of the candidate library is compared with the modal fingerprint of the reference library, reporting the number of on bits in each library, the number of common bits, the number of on bits in the candidate library not present in the reference library, and the number of on bits in the reference library not present in the candidate library. Options in the Compare Libraries 3D Fingerprints Onbits control panel allow you to list the molecules in the candidate library with on bits present in the reference library and to select the top N molecules from the candidate library (the ones with the highest number of common bits).
Distance-based library augmentation
The distance-based library augmentation functionality enables you to select a diverse set of models from a specified library to add to a previously defined library. Both libraries must have been defined using the library definition functionality described above.
The stochastic optimization proceeds similar to a distance-based diverse selection, except that a subset of compounds (the library to add to) is not allowed to vary and is maintained as a fixed selection throughout the optimization.
The hole identification capabilities are accessed by opening the Find Holes control panel. Do this by selecting the Holes in Property Space/Find Holes menu item on the LIBRARY COMPARISON card in the COMBI-CHEM I deck.
The functionality available includes:
|
A new column containing the sizes of the holes is added to the study table.
|
3D fingerprint hole finding and filling
The modal 3D fingerprint of the candidate library is compared with the modal fingerprint of the reference library, reporting the number of OnBits in each library, the number of common bits, the number of OnBits in the candidate library not present in the reference library, and the number of OnBits in the reference library not present in the candidate library.
Distance histogram library comparison
Distance histogram library comparison provides an easy method to examine candidate complement libraries. Libraries can be ranked based on the number of interesting compounds that could complement an existing collection. In addition, the members of the candidate library that do complement an existing set can be isolated from the rest of the offering. Distance histogram library comparison can also be used to identify how well a given subset of a library is able to represent the complete set.
The library comparison scheme proceeds as follows: for every compound in the Candidate Library, distances to all compounds in the Reference Library are investigated. Depending on the distance measurement option, the minimum, maximum or average of these distances is retained. The set of distances obtained can then be plotted individually or as a histogram plot over the distance range. Results can also be analyzed in a Study Table.
Analysis of distance histograms
Histograms of minimum distance distribution are probably the most useful for the identification of library complements. Comparing two candidate libraries against a single reference library, the candidate library having the distribution of minimum distances shifted to larger values can be described as being a better source of complementary compounds.