MSI Product Previous Next Contents Index Top
QSAR



B       Tutorial: Building a QSAR equation

QSAR+ is a data exploration and productivity tool that can provide insight into structure-activity relationships. A QSAR (quantitative structure-activity relationship) is a multivariate, mathematical relationship between a set of 2D and 3D physicochemical properties (that is, descriptors) and a biological activity. The QSAR relationship is expressed as a mathematical equation. The analysis of the statistical relationships between molecular structure and various properties provided by Cerius2·QSAR+ facilitates the understanding of how chemical structure and biological activity relate.

You can use Cerius2·QSAR+ to help you make informed decisions about which candidate compounds should be considered (based on estimates of biological activity), as well as to help you gain insight into various underlying biological processes. You can also use QSAR+ to provide basic insight into structure-property relationships. This information can be gathered before modeling the atomic-level mechanisms behind these relationships (using other Cerius2 modules). Using the analysis capabilities of the QSAR+ module, you can then correlate the values calculated from the modeling programs with various properties. This correlation ability makes Cerius2·QSAR+ a useful complement to your molecular modeling programs.

This tutorial familiarizes you with Cerius2·QSAR+ by illustrating the step-by-step procedure for building a QSAR equation, including:


Before you begin

You need these -modules

To complete this tutorial, you need a licensed copy of Cerius2 that includes these modules:


Entering the molecules in the training set

Your first step is to choose the molecular structures to use as a training set. You can build these structures using the various Cerius2 building and sketching tools. Or you can load structural data from a variety of common file formats generated by other molecular modeling and chemical database software. In this lesson you will load molecules provided with the Cerius2 software, specifically, a set of dopamine beta-hydroxylase inhibitors.

Starting with a new Cerius2 session, select the File/Load Model menu item in the main Cerius2 Visualizer panel and use the file browser to navigate to the directory Cerius2-Resources/EXAMPLES/DBH.

Select all the .msi files from dbh02 through dbh52 (<Shift>-cklick) and click LOAD to load them into Cerius2.

A total of 47 models should be loaded.

Before you add the 47 models to the QSAR study table, you will select and enter a set of molecular descriptors into the study table. The process of calculating descriptors for a set of molecules is significantly faster if the molecules are added to a study table that already contains the columns corresponding to the molecular descriptors to be calculated.

Entering molecular descriptors

A descriptor is a molecular property that QSAR+ can calculate and use in determining a new QSAR. QSAR+ calculates a wide variety of spatial, electronic, topological, and other descriptors. QSAR+ also enables you to modify existing descriptors and to create or import new descriptors from other Cerius2 modules, such as Molecular Field Analysis (MFA) and Receptor, to meet your specific requirements.

1.   Create an empty study table.

Go to the QSAR deck of cards and select Show Study Table on the QSAR card. This opens a new, empty Study Table control panel.

2.   Add default descriptors

Select the Descriptors/Add Default menu item in the study table to add a set of default descriptors to the study table.

3.   Add other descriptors

You can add more descriptors to the study table:

Open the Descriptors control panel by selecting the Descriptors/Select menu item in the study table.

In the the Descriptors control panel, set the Descriptors in family popup to Topological.

Assure that the other popups are set to Display and All, and click the associated action button to display them in the table in the Descriptors control panel.

Descriptors 58 (Balaban) to 65 (Zagreb) are displayed.

Set the popup just to the right of the action button to Select and click the action button to Select All descriptors in the Topological family.

The descriptors in the family are highlighted.

Click ADD in the Descriptors control panel to add the selected descriptors to the study table.

An additional set of 29 descriptors from the topological family are added to the study table, giving a total of 49 descriptors.

Load the molecules into the QSAR study table

You can now add the molecules in the Cerius2 models window to the study table.

Select the Molecules/Add All menu item in the study table, to add all the molecules in the Models window to the study table.

As each molecule is added, QSAR+ automatically calculates charges, adds hydrogens, and performs an energy minimization. In addition, all the molecular descriptors are calculated.

Entering biological activity data

Next, you need to enter biological activity data for all the molecules into the study table, the same way as you enter data into any Cerius2 table. That is, you can type the activity data directly into study table cells or copy the activity data from another table. Here, you will enter data directly into the study table.

1.   Add data to the study table.

Click to select a cell in which to enter data (in the Activity column) and then type in the appropriate value from the table below. The value is entered in the cell when you press <Enter> or select another cell.

Table 3. Activity values to enter into study table

Molecule Activity Molecule Activity
dbh02  
3.00  
 
dbh28  
4.12  
dbh04  
3.15  
 
dbh29  
4.21  
dbh06  
3.30  
 
dbh30  
4.28  
dbh07  
3.45  
 
dbh31  
4.28  
dbh08  
3.47  
 
dbh32  
4.31  
dbh09  
3.47  
 
dbh33  
4.33  
dbh10  
3.70  
 
dbh34  
4.33  
dbh11  
3.76  
 
dbh35  
4.44  
dbh12  
3.81  
 
dbh36  
4.48  
dbh13  
3.83  
 
dbh37  
4.51  
dbh14  
3.94  
 
dbh38  
4.55  
dbh15  
4.08  
 
dbh39  
4.77  
dbh16  
4.13  
 
dbh34  
4.92  
dbh17  
4.13  
 
dbh31  
4.92  
dbh18  
4.16  
 
dbh42  
5.25  
dbh19  
3.24  
 
dbh44  
5.29  
dbh20  
3.45  
 
dbh45  
5.62  
dbh21  
3.69  
 
dbh46  
5.66  
dbh22  
3.80  
 
dbh48  
5.70  
dbh23  
3.83  
 
dbh49  
5.82  
dbh24  
3.92  
 
dbh50  
5.92  
dbh25  
3.99  
 
dbh51  
6.17  
dbh26  
4.01  
 
dbh52  
7.13  
dbh27  
4.02  
 
 
 

Before you generate a QSAR equation you need to specify which columns in the study table should be used as dependent and independent variables.

1.   Set the dependent variable

Select the column named Activity in the study table by clicking the column heading. Mark this column as a dependent variable (Y) by selecting the Variables/Set Y menu item in the study table.

2.   Set the independent variables

By default, the descriptors columns are automatically marked as independent variables (X) when they are added to the study table. If this didn't happen, select all the descriptors columns, from Charge to Zagreb, in the study table. Mark these columns as independent variables by selecting the Variables/Set X menu item in the study table menubar.

Exploring the data

You can now analyze the dependent and independent variables using the statistical and graphics tools available in QSAR.

Generate histograms of selected variables by selecting one or more of the columns and selecting the Tools/Graphics/Histogram Plots menu item in the study table menubar.

Note

Every histogram you make occupies one of the "plot slots" in the graphs gallery. The maximum number of plots you can have in the gallery is 49. If you try to make more than 49 plots messages appear in the Cerius2 text window warning you that the maximum number of plots has been exceeded. If this happens, Click the Reset command on the GRAPHS card (TABLES & GRAPHS deck) to empty the graph gallery.  

Generate Rune plots of selected variables by selecting one or more of the columns and selecting the Tools/Graphics/Rune Plots menu item in the study table. If no column is selected, rune plots for all the independent variables are generated.

Calculate descriptive statistics for all dependent and independent variables by selecting the Tools/Statistical/Summary Statistics menu item in the study table.

The statistics are calculated before the Descriptive Statistics control panel appears.

Generate a QSAR equation

You are now ready to generate a QSAR equation. Several regression methods are available in QSAR, including multiple linear regression, partial least squares (pls), simple linear regression, stepwise multiple linear regression, principal components regression (PCR), genetic function approximation (GFA), and genetic partial least squares (G/PLS). In this session you will use the GFA method.

Select GFA in the Methods popup in the study table. Then click the RUN pushbutton to start a GFA calculation with the default parameters.

The GFA calculation takes a few minutes.

Analyzing the QSAR equation

The GFA calculation performed in the previous step results in a set of 99 QSAR equations. You can analyze each of these equations with the Equation Viewer control panel.

1.   View the equation terms, coefficients, and statistics

Open the Equation Viewer control panel (if it does not appear automatically) by selecting the Tools/Equation Viewer... menu item in the study table.

Click an equation row number in the upper table in the Equation Viewer control panel to display the terms, coefficients, and statistics for that equation in the lower part of the control panel.

2.   Connect the 2D plot to the equation viewer

Click the More... button in the QSAR Equation section of the Equation Viewer control panel to open the QSAR equations control panel for setting preferences. Then check the Auto update 2D Plot checkbox. Now, every time you select a QSAR equation in the Equation Viewer control panel, the corresponding predicted vs. observed activity plot is automatically updated.

You may want to move teh Graph window so that it doesn't overlap the Equation Viewer control panel.

3.   Investigate the plot-equation relationship

You can also identify points in the 2D plot with molecules in the QSAR study table:

Select QSAR equation number 1 in the Equation Viewer control panel and click the Plot Equation action button (if you did not set the preference in the previous step). The 2D plot of predicted vs. observed activity is updated to show equation number 1.

Select a few points in the 2D plot (by dragging out a selection rectangle). Selected points are highlighted in yellow. Now click the Show selected points action button in the Equation Viewer control panel.

The rows corresponding to the selected points in the 2D plot are highlighted in the study table. In addition, the corresponding molecules appear in the models window, and information about the selected molecules is printed in the text window.

You can also go the other way: select molecules in the study table and see where they are in the 2D plot:

Select a few rows in the study table (use <Ctrl>- or <Shift>-click). In the Equation Viewer control panel, click the Plot Equation action button. The 2D plot shows the points corresponding to the selected rows in red and other points in green.

Saving the QSAR equations

QSAR+ allows you to save the QSAR equations generated in the current session for later retrieval and use.

Open the Save QSAR Equations control panel by clicking the Save Equations... button at the top of the Equation Viewer control panel.

In the Save QSAR equations control panel, set the popup to Current Equations Set, enter appropriate names and comments in the corresponding boxes, and enter testset.qsar in the QSAR equations file entry box. Then click Save.

The entire set of 99 GFA equations is saved in the file testset.qsar. You can read in QSAR equations saved in .qsar files into the equation viewer by using the Open Equations button.

Predicting the activity of new molecules

Once you have calculated a QSAR equation, it is easy to use it to predict the activity of a molecule outside the training set.

1.   Load the new molecule

To calculate the activity of a new molecule, all you need to do is add it to the study table that contains a column representing the QSAR equation and the original descriptors used to generate the equation. We illustrate the procedure by calculating the activity of a copy of one of the molecules used in the training set and confirming that the value calculated by the QSAR equation is the same as that for the original molecule.

Select the File/Load Model menu item in the Cerius2 Visualizer and navigate to the same directory you used at the beginning of the session:
Cerius2-Resources/EXAMPLES/DBH

Select the dbh02.msi file and click LOAD to load the molecule into Cerius2. The copy of dbh02 is named dbh02_1.

2.   Add the new model to the study table

Make sure that dbh02_1 is current in the Models window and add it to the study table by selecting the Molecules/Add Current menu item in the study table.

The new molecule is added at the bottom of the study table. QSAR+ automatically calculates charges, adds hydrogens, and performs an energy minimization (as for the original molecules). All the descriptors are automatically calculated, including the QSAR equation column (GFA Predicted Activity), which should show the same value as for the original dbh02 molecule in row 1.

If the cell shows <Pending>, select the new row and select the <Pending> columns (<Ctrl>-click) to keep the row selection and select Descriptors/Recalc Desxcriptors from the study table to compute the values.

3.   Modify an existing molecule

QSAR+ allows you to easily inspect the effect of chemical changes in an existing molecule (a molecule already in the study table) on the predicted activity value.

Open the Molecule Preferences control panel by selecting the Preferences/Molecules menu item in the study table.

Check the Recalculate Descriptors When Models are Edited checkbox in the Molecule Preferences control panel.

Open the Sketcher control panel by selecting the Build/3D-Sketcher in the Visualizer main panel.
Use the Sketcher to change the sulfur atom in the dbh02_1 model to an oxygen: Select the Edit Element tool and set the Element entry box to O in the Sketcher control panel. Then click the sulfur atom in the dbh02_1 model.

Immediately after picking the sulfur atom, QSAR+ checks and fixes the number of hydrogens, recalculates charges, minimizes the molecule, and recalculates the descriptors in the study table corresponding to model dbh02_1.

The new value obtained for the predicted activity should be close to 3.679.

Saving the study

You can save your QSAR study, including molecules and the QSAR study table, using the Cerius2 Save Session function:

Select the File/Save Session menu item in the Cerius2·Visualizer. Enter a name for the session you want to save (such as qsar_tutor.mss) and click the SAVE button. The Cerius2 session is saved for later retrieval.

4.   Finish up

To end the Cerius2 session, close all open control panels and select File/Exit from the Visualizer menu bar.
If you want to go on to another tutorial or use Cerius2 to run an experiment, close all control panels and select File/New Session from the Visualizer menu bar.


Summary

This tutorial familiarized you with QSAR+ by illustrating the steps you could perform to build a QSAR equation, including:

As you become more experienced with using Cerius2·QSAR+ to build QSAR equations, you may want to experiment with the items in the QSAR menu card (which are the same as in the study table menubar). This enables you to perform tasks in the order that best suits your type of analysis, as well as to control the amount of processing performed automatically by QSAR+.



MSI Product Previous Next Contents Index Top

Last updated May 18, 2000 at 05:51PM Pacific Daylight Time.
Copyright © 2000, Molecular Simulations Inc. All rights reserved.