MSI Product Previous Next Contents Index Top
Combi-Chem


4f. Compound selection procedures

Back to the Combinatorial Chemistry Methodologies index.

Identification of outliers

Outliers are identified as those models whose distance to the centroid defined by the full set of models is greater than the average distance to the centroid plus a user-defined number of standard deviations.

The functionality to identify and remove outliers in property space is accessed by going to the LIBRARY ANALYSIS card in the COMBI-CHEM I deck and selecting the Property Space Outliers menu item. This opens the Property Space Outliers control panel.

Outliers are color-coded in the corresponding 3D plot, which is automatically updated. Options are included to display only outlier rows in the study table and to either remove the outliers from the Observations group or to permanently delete the corresponding rows from the study table.

Selecting diverse compounds

Having calculated the various descriptors and their principal components, the next step is to select diverse subsets, that is, subsets of substituents or models that uniformly span the property space accessible to the entire set. Several approaches may be taken towards the selection of diverse sets. In this section, we will cover cluster-based, distance-based and cell-based techniques.

Cluster-based selections

Selections of diverse compounds may be derived from cluster analysis. This method is particularly well suited when working with smaller lists of compounds (10-1,000) although cluster analysis is amenable to larger sets. Typical cases involve the selection of diverse sets of reagents from reagent lists. In those cases, HCA with either Complete or Average linkage usually provides the best results.

To obtain cluster-based selections, the number of compounds desired should be entered as the number of clusters in the Statistical Method Preferences panel.

On the Study Table, select Preferences/Statistical Method to bring up the Statistical Method Preferences control panel. Set Statistical Method to CLUSTER and Method to HCA/Average Linkage.

Clicking the Get Clusters button will intersect the dendogram so as to provide the desired number of clusters. In the case of classical descriptor or principal components, the compound closest to each cluster centroid will be identified and selected as a representant.

When working with fingerprints (ISIS or Daylight), cluster centroids are ill-defined and no compound selection will be performed. However, the Get Clusters procedure will assign a cluster ID to each compound as a new column in the Study Table. There are two work-arounds to this situation:

1.   MDS can be applied to the fingerprints prior to the cluster analysis. The MDS coordinates can then be used as independent variables in the cluster analysis in place of the fingerprints.

2.   The new Cluster ID column created in the Study Table can be used as an independent variable for a cell-based selection. The number of cells specified in the cell-based selection should equal the number of clusters obtained in the previous step. The procedure is equivalent to selecting one compound randomly from each cluster.

Zero-iteration relocation clustering

Zero-iteration relocation clustering is a very fast clustering method, which assigns each molecule to the cluster belonging to the closest seed molecule. The set of seeds is selected as in standard Relocation clustering: randomly or by selecting study table rows. You may also specify the number of relocations to be performed. Standard relocation clustering corresponds to an "unlimited" number of relocations (i.e., the process continues until the relocations stabilize), and zero-relocation clustering is the opposite: no relocations are performed.

The zero-relocation method is very sensitive to the selection of seeds.

Distance-based selections

The selection of highly diverse subsets of models from the library is based on the stochastic optimization of diversity functions using a single-point-mutation Monte Carlo technique. The diversity target functions for distance-based selections are evaluated from inter-compound distance information.

Distance-based diversity selection is accessed by going to the LIBRARY ANALYSIS card in the COMBI-CHEM I deck and selecting the Select Molecules Diverse Distance-based menu item. This opens the Select Diverse control panel

You can control:

Selected models (rows) are highlighted in the study table and are shown in red in the Cerius2 Models window. If you add to a previously selected set of diverse models, the previously-selected set is shown in green and the newly-selected set in red.

Note

Cell-based selections

In the cell-based approach, the properties defined as independent variables are binned, to divide property space into cells. Diverse models are then obtained by selecting models from the different cells. For each filled cell (cells containing models), the model closest to the cell center is selected as representative of the cell.

The number of cells generated depends on the number of models to select and on the number of properties defined as independent variables. The number of cells is always less than or equal to the number of models to select. Properties are binned with a bias towards the ones that exhibit the largest variation, i.e., those properties with the largest variation (range) tend to have more divisions than the ones that do not vary much for the set of models under consideration.

Cell-based diversity selection is accessed from the Cell-Based Selection control panel, which is opened by selecting the Select Molecules Diverse Cell-based menu item from the LIBRARY ANALYSIS card in the COMBI-CHEM I deck.

You can control:

Plotting Preferences

The Plot Preferences push button opens the Cells Plot Preferences control panel, which allows you to adjust plotting options.

You can control:

Optimizing existing selections

The distance-based selection technique can be used to optimize a given selection of compounds. If the number of molecules to select (in the Select Diverse panel) is equal to the number of selected rows in the Study Table, the compounds corresponding to the selected rows will be used as the starting point in the optimization process.

Note

For this procedure, the Bin Property Space for Initial Selection option in the Analysis Preferences panel should be deactivated.

Selecting similar models

Searching for similar models is especially important during the optimization phase of library design. After lead compounds with moderate activities have been found, the overall goal in the subsequent lead follow-up phase is to find compounds that resemble the lead compounds. This may be accomplished through rounds of library focusing experiments. The same technique can also be used to find follow-up compounds from commercial compound sources.

The Select Similar control panel enables you to perform similarity selections around a lead compound. Open the Select Similar control panel by selecting the Select Molecules Similar menu item from the LIBRARY ANALYSIS card in the COMBI-CHEM I deck.

When N Closest Molecules is selected under Similarity Criteria, the Select Similar control panel allows you to:

When All Molecules within Distance is selected under Similarity Criteria, the Select Similar control panel allows you to:

Selecting restrained models

See 4g. Diversity under restraints (C2·LibProfile).

Selecting random compounds

Random selections are often useful in validation procedures to compare a test procedure to a random design process.

Random selections may be obtained by using the Select Diverse panel (Distance-based selections) and providing the number of desired random samples as the number of molecules to be selected.

To obtain a random selection, go to the Analysis Preferences panel (click the Preferences button on the LIBRARY ANALYSIS card) and make sure that:

Compound selection preferences

The Analysis Preferences control panel provides several options for controlling diversity and similarity calculations. Open it by selecting the Preferences menu item from the LIBRARY ANALYSIS card in the COMBI-CHEM I deck.



MSI Product Previous Next Contents Index Top

Last updated May 19, 2000 at 01:52PM Pacific Daylight Time.
Copyright © 2000, Molecular Simulations Inc. All rights reserved.