MSI Product Previous Next Contents Index Top
QSAR



G       Tutorial: CSAR Tutorial


Recursive partitioning

Classification SAR enables fast derivation of partitioning models for the prediction of activities or properties, using algorithms that can handle qualitative data. Classification SAR handles very large datasets, allowing more objective analysis than is practical when using selected experimental data.

1.   Open an example study table

Start with a new Cerius2 session, then go to the LIBRARY ANALYSIS card on the COMBI-CHEM I card deck and select Show Study Table. Select File/Open from the study table and open the file Cerius2-Resources/COMBICHEM/demos/rp/mao.tbl.

This file contains screening data for 1641 structures, classified from 0 (inactive) to 3 (highly active). The study table contains the combichem default descriptors, the E-state keys, and ISIS fingerprints.

2.   Selecting the variables

Select the Activity2 column and set it as the dependent variable by clicking the Y icon in the study table. Select the Charge column and then <Shift>-click the Zagreb column (50 columns to the right) and mark these combichem default descriptors as the independent variables by clicking the X icon.

3.   Building and crossvalidating the recursive partitioning tree

Select Preferences/Statistical Method in the study table. Select RP from the Statistical Method popup and click the RP Options pushbutton to set the following preferences in the Recursive Partitioning Preferences control panel:
Set the Weight To Classes.
Set Score splits using to Twoing Rule.
Set the Nodes Must Contain popup to Min # of Samples and set the value to 10.
Set the Maximum Tree Depth to 5.
Set the Crossvalidation groups to 4 and click the Do Crossvalidation Test action button.

Click the RUN pushbutton in the study table to generate the RP model.

Once the run is complete, the results appear in the Table Manager control panel for the crossvalidation experiments and for the final model built from the whole dataset. For each class the results show:

Class %Obs Correct. The number of actual members of class X that were predicted to be in class X.

Overall %PredCorrect. The number of all objects predicted to be in class X that actually are in class X.

Enrichment. The ratio of correct predictions for the objects predicted to be in class X compared to the occurrence rate of class X in the dataset as a whole.

4.   Classifying new structures

Select Molecules/From SD File from the study table. Click Preferences in the Add Molecules from SD File control panel and assure that all checkboxes in the SD File Preferences control panel are unchecked.
Click Molecule Prefs in the Add Molecules from SD File control panel and make sure that the checkboxes in the Molecule Preferences control panel to Add Hydrogens, Minimize Energy, Calculate Charges, and Generate Conformers are unchecked.
Use the file browser in the Add Molecules from SD File control panel to SELECT the file Cerius2-Resources/COMBICHEM/demos/benzo_625.sd. Choose the Range option, set the range values to 1 and 20, and click the IMPORT MOLECULES button.

As the molecules are imported, the descriptors are calculated and the recursive partitioning equation is applied to predict a value for each new molecule.

5.   Working with categorical descriptors

Select File/Reset from the study table to clear the study table.
Re-open the mao.tbl table as in Step 1 above.
Assure that the X and Y variables are cleareld by selecting all columns in the study table (click the top left cell) and clicking the Clear Indep/dep icon.
Now set the ISISKEYS column (at the far right of the table) as the X variable and the Activity2 column as the Y variable.
Check that the settings in the Recursive Partitioning Preferences control panel are the same as in Step 3 above and click the Crossvalidation Test action button and then the RUN pushbutton in the study table.


Using CSAR with binary data files

1.   Create the BDF file

Start a new Cerius2 session (from the Visualizer select File/New Session).
Go to the LIBRARY ANALYSIS card on the COMBI-CHEM I card deck and select the Show Study Table menu item.
Select File/Open from the study table and open the file Cerius2-Resources/COMBICHEM/demos/rp/mao.tbl.
Select both the column Activity2 and the range of columns Charge through Zagreb.
From the BINARY DATA FILES card on the COMBI-CHEM I deck, select Create BDF/From Study Table. Change the filename to mao.bdf and click the CREATE BDF button.
Clear the study table.

2.   Running and crossvalidating the RP model

From the BINARY DATA FILES card on the COMBI-CHEM I deck choose Select BDF. Select mao.bdf (click the action button in the corner of the file browser to return to your top-level run directory).
In the listbox on the right side of the Binary Data File control panel, select Activity2 and mark it as the Y variable using the Y icon. Select the items Charge through Zagreb and mark them as the X columns. Select RP from the popup and click the Stat Method Preferences button. Set the method popup to RP and click the RUN button.

3.   Classifying new structures

From the BINARY DATA FILES card on the COMBI-CHEM I deck choose Select BDF. Select Cerius2-Resources/COMBICHEM/demos/benzo_720.bdf. Click the Browse pushbutton and SELECT the mao_rp.dep file. Select the Dependent Properties from button.

This creates predicted-activity columns for the members of the benzo_720 library.

In the Binary Data File control panel, select the RP item from the bottom of the listbox on the right. Click the Export BDF to Table button, choose the Range of Rows radio button in the Export BDF data to Study Table control panel, and specify the first 10 rows. Click the EXPORT TO TABLE pushbutton.
Examine the study table to browse the predicted activities for the structures.



MSI Product Previous Next Contents Index Top

Last updated May 18, 2000 at 05:51PM Pacific Daylight Time.
Copyright © 2000, Molecular Simulations Inc. All rights reserved.