MSI Product Previous Next Contents Index Top
QSAR



2       QSAR+ QuickStart

QSAR+ is a data exploration and productivity tool that can provide insight into structure-activity relationships. A QSAR (quantitative structure-activity relationship) is a multivariate, mathematical relationship between a set of 2D and 3D physicochemical properties (descriptors) and a biological activity. The QSAR relationship is expressed as a mathematical equation. Analysis of the statistical relationships between molecular structure and various properties provided by QSAR+ facilitates an understanding of how chemical structure and biological activity relate.

You can use QSAR+ to help make informed decisions about which candidate compounds should be considered (based on estimates of biological activity), as well as to help gain insight into various underlying biological processes. You can also use QSAR+ to provide basic insight into structure-property relationships. You might, for instance, first use other Cerius2 modules to model the atomic-level mechanisms behind quantitative structure-activity relationships. Using the analytical capabilities available in QSAR+, you can then correlate the values calculated from the modeling programs with various properties. This correlation ability makes QSAR+ a useful complement to your molecular modeling programs.

This chapter familiarizes you with QSAR+ by illustrating the procedure for building a QSAR equation using C2·QSAR+.


Understanding the QSAR generation process

This section describes the general procedure for generating a QSAR, which consists of the following 11 steps:

1. Identify the training set

The first step is to choose the molecular structures to use as the training set. QSAR+ provides tools that enable you to build new structures, create a congeneric series of 3D structures, and import chemical structure files in a wide variety of formats.

2. Enter biological activity data

For each of the molecules in the training set, you must provide information about the observed biological activity associated with that molecule.

3. Generate conformations

If you are performing a 3D QSAR analysis, you usually need conformational information, which you usually obtain by performing a conformational search. You can choose from a variety of conformation generation methods.

4. Calculate descriptors

QSAR+ calculates a wide variety of spatial, electronic, topological, information-content, thermodynamic, conformational, quantum mechanical, and shape descriptors. QSAR+ gives you the ability to modify existing descriptors and the combination of descriptors in a descriptor set. You can create or import new descriptors from other Cerius2 modules such as Molecular Field Analysis (MFA) and Receptor to meet your specific requirements.

5. Explore the data

You can generate graphs to depict descriptor distribution. If holes exist in descriptor sets, you can choose new compounds to fill the holes. You can also display correlation matrices to assist you in identifying descriptors that are highly correlated and histograms and rune plots to help you examine the uniformity of your data. Descriptive statistics are available to further characterize descriptors. Additionally, you can transform and normalize descriptors, as appropriate. You can also carry out principal components analysis (PCA) and cluster analysis to further characterize your data.

6. Generate a QSAR equation

After you identify the appropriate dependent and independent variables, you can choose from several statistical methods for generating a QSAR equation. These include multiple linear regression, partial least squares (PLS), simple linear regression, stepwise multiple linear regression, and principal components regression (PCR). Additionally, if the genetic function approximation (GFA) functionality is installed, you can also perform a genetic analysis, either GFA or G/PLS, to create a QSAR equation.

7. Validate the equation

You can apply validation techniques to identify outliers and leverage points. You can also use graphic analyses and crossvalidation to characterize the robustness of the QSAR.

8. Analyze the equation

You can use graphical tools to plot observed vs. predicted activities and to identify outliers. You can also generate 3D plots to visualize the positions of important 3D-QSAR descriptors from Molecular Field Analysis (MFA) or Receptor Surface Analysis (RSA) in relation to the molecules.

9. Save the QSAR equation

You can save calculated QSAR equations in QSAR+ equation databases for later use.

10. Predict activity

You can now use a calculated QSAR equation to predict biological activity of compounds. In Cerius2·QSAR+, you can simply draw a candidate structure, add it to your study, apply your calculated QSAR equation, and immediately view the predicted activity.

11. Save the study

When you are finished with your study, you can save the entire QSAR analysis, including all its component structures and conformations, for later review and use.


Using QSAR+

Cerius2·QSAR+ does not impose a specific order in which you must perform various tasks. Instead, QSAR+ attempts to meet the needs of users who are both novices at and experienced with the QSAR generation process.

All the QSAR+ commands can be accessed from the menu cards in the QSAR card deck. (The menu cards in this deck can vary, depending on the modules you have licensed). More experienced users can select items from these cards and perform tasks in the order that best suits the type of analysis that they want to perform. By selecting items from these cards, more experienced users (as well as those novice users who want to do so) can also change various settings to better meet their needs, as well as control the amount of processing performed automatically by QSAR+.

Before you begin

The software must be ready. That is:

 The appropriate Cerius2 software must be installed and running.

 A properly licensed copy of QSAR+ must be installed.


Creating a training set

You create a training set to enter into Cerius2 the chemical structure for each of the molecules to be used in building a QSAR. You can build these structures using the various Cerius2 building and sketching tools. Alternatively, you can load structural data from files in a variety of common formats generated by other molecular modeling and chemical database software. In this session you will load molecules already saved in Cerius2 Resources directories, specifically, a set of dopamine beta-hydroxylase inhibitors.

1.   Load the molecules into Cerius2

Starting with a new Cerius2 session, select the File/Load Model menu item in the Cerius2 Visualizer panel and use the file browser to navigate to the directory Cerius2-Resources/EXAMPLES/DBH.

Select all the .msi models contained in this directory, from dbh02 to dbh52, and click LOAD to load them into Cerius2.

A total of 47 models should be loaded and also listed in the Cerius2Model Manager.

2.   Load the molecules into the QSAR study table

Select the QSAR deck of cards and select the Show Study Table menu item on the QSAR card. This opens the QSAR Study Table control panel.

Select the Molecules/Add All menu item in the study table to add all the molecules in the Models window to the study table. You can also do this by clicking the add molecules icon in the study table toolbar.

As each molecule is added, QSAR+ automatically calculates charges, adds hydrogens, and performs an energy minimization. Charge calculation, hydrogen addition, and energy minimization are performed according to the default values or user-specified criteria for performing each of these tasks.These default settings can be modified by selecting the Preferences/Molecules menu item in the study table menubar.

Entering biological activity data

QSAR+ needs biological activity data for each molecule in the training set. In this example, you enter biological activity data for a set of molecules into the study table. You enter biological activity data in the same way that you enter data into any Cerius2 table. That is, you can type the activity data directly into the study table cells or copy the activity data from another table. In this session you enter the data directly into the study table.

Use the mouse to select the cell (in the Activity column) in which you want to enter data from Table 1 (below) and then type the data. Typed characters appear both in the cell and in the edit window at the top of the study table (where editing and formatting of data take place). Formatted data is entered into the cell when you press <Enter> or use the mouse to select another table cell.

Table 1. Activity data

Molecule Activity Molecule Activity
dbh02  
3.00  
 
dbh28  
4.12  
dbh04  
3.15  
 
dbh29  
4.21  
dbh06  
3.30  
 
dbh30  
4.28  
dbh07  
3.45  
 
dbh31  
4.28  
dbh08  
3.47  
 
dbh32  
4.31  
dbh09  
3.47  
 
dbh33  
4.33  
dbh10  
3.70  
 
dbh34  
4.33  
dbh11  
3.76  
 
dbh35  
4.44  
dbh12  
3.81  
 
dbh36  
4.48  
dbh13  
3.83  
 
dbh37  
4.51  
dbh14  
3.94  
 
dbh38  
4.55  
dbh15  
4.08  
 
dbh39  
4.77  
dbh16  
4.13  
 
dbh34  
4.92  
dbh17  
4.13  
 
dbh31  
4.92  
dbh18  
4.16  
 
dbh42  
5.25  
dbh19  
3.24  
 
dbh44  
5.29  
dbh20  
3.45  
 
dbh45  
5.62  
dbh21  
3.69  
 
dbh46  
5.66  
dbh22  
3.80  
 
dbh48  
5.70  
dbh23  
3.83  
 
dbh49  
5.82  
dbh24  
3.92  
 
dbh50  
5.92  
dbh25  
3.99  
 
dbh51  
6.17  
dbh26  
4.01  
 
dbh52  
7.13  
dbh27  
4.02  
 
 
 

At this point in the process of building a QSAR equation, you have a study table containing both chemical structures and biological activity data for each molecule with which you want to work. Additionally, QSAR+ has (by default) added hydrogens, performed energy minimizations, and calculated charges for each of these molecules. You are now ready to perform the next step and generate a QSAR equation.

Calculating descriptors

A descriptor is any of a number of built-in molecular properties that QSAR+ can calculate and use in determining new QSAR relationships. QSAR+ provides a wide variety of spatial, electronic, topological, information-content, thermodynamic, conformational, quantum mechanical, and shape descriptors. Descriptor data can be imported from other Cerius2 modules, including Receptor and MFA. Groups of descriptors are designated default descriptors for different applications (QSAR, Diversity, QSPR). In this session you will use the default descriptors for the QSAR application.

Make sure that the default options corresponding to the QSAR application are set by selecting the Preferences/Default Set/QSAR menu item in the study table menubar.

Select the Descriptors/Add Default menu iteim in the study table menubar to add a set of default descriptors to the study table. You can also dodthis by clicking the add descriptors icon in the study table toolbar.

At this point 20 molecular descriptors are added to the study table, and their values are calculated for each molecule present in the table.

Setting dependent and independent variables and exploring the data

Before you generate a QSAR equation you need to specify which columns in the study table should be used as dependent and independent variables.

1.   Set the dependent variable

Select the column named Activity in the study table by clicking the column heading. Mark this column as dependent variables (Y) by selecting the Variables/Set Y menu item in the study table menubar. You can also do this by clicking the Y icon in the study table toolbar.

2.   Set the independent variables

By default, the descriptors columns are automatically marked as independent variables (X) when they are added to the study table. If this does not happen, select all the descriptors columns (from Charge to MolRef) in the study table by <Shift>-clicking the column headings. Mark these columns as independent variables by selecting the Variables/Set X menu item in the study table menubar. You can also do this by clicking the X icon in the study table toolbar.

3.   Explore the data

You can now analyze the dependent and independent variables using the statistical and graphics tools available in QSAR

Generate histograms of selected variables by selecting one or more columns and selecting the Tools/Graphic/Histogram Plot menu item in the study table menubar. If no column is selected, histogram plots for all the independent variables are generated. You can also do this by clicking the histograms icon in the study table toolbar.

Generate rune plots of selected variables by selecting one or more columns and selecting the Tools/Graphic/RunePlot menu item in the study table menubar. If no column is selected, rune plots for all the independent variables are generated. You can also do this by clicking the rune plots icon in the study table toolbar.

Calculate descriptive statistics for all dependent and independent variables by selecting the Tools/Statistical/Summary Statistics menu item in the study table menubar You can also do this by clicking the summary statistics icon in the study table toolbar.


Generating a QSAR equation

You are now ready to generate a QSAR equation. Several regression methods are available in QSAR, including multiple linear regression, partial least squares (PLS), simple linear regression, stepwise multiple linear regression, principal components regression (PCR), genetic function approximation (GFA), and G/PLS. In this session you will use the GFA method.

Set the Methods popup to GFA. (This popup is the one next to the RUN pushbutton in the study table.) Then click the RUN pushbutton to start the GFA calculation with the default parameters.

The GFA calculation take a few minutes. It generates a set of 99 or 100 QSAR equations, which will be downloaded into the Equation Viewer control panel, sorted by the lack-of-fit (LOF) parameter. By default, the first (best) equation is validated using the crossvalidation method.This QSAR equation is automatically inserted as a new column in the study table (GFA Predicted Activity), along with a column showing the residuals (observed - predicted activity values, GFA Residuals Activity). A plot of predicted vs. observed activity values is also displayed in the plot window. Results of the crossvalidation of the QSAR equation are shown in the text window.

As it validates an equation, QSAR+ displays information about the validation in the Cerius2 text window. This information includes:


Analyzing the QSAR equations

The GFA calculation performed in the previous step resulted in a set of 99 or 100 QSAR equations. You can analyze each of them using the Equation Viewer control panel.

Make sure the Equation Viewer control panel is visible by selecting the Tools/Equation Viewer menu iteim in the study table menubar.

Click the equation number row in the upper table in the Equation Viewer control panel (or select the equation number in the Number box) to display the terms, coefficients, and statistics for that equation.

Click the More... button in the QSAR Equation section of the Equation Viewer control panel to open the preferences control panel for QSAR equations. Then check the Auto update 2D Plot checkbox.

Now, every time you select a QSAR equation in the Equation Viewer control panel, the corresponding predicted vs. observed activity plot is automatically updated.

You can also identify points in the 2D plot with molecules in the QSAR study table:

Select the QSAR equation number 1 in the Equation Viewer control panel and click the Plot Equation action button. The 2D plot of predicted vs. observed activity should be updated.

Use the mouse to select a few points in the 2D plot (by dragging). Selected points should be highlighted in yellow. Now click the Show selected points action button in the Equation Viewer control panel.

The rows corresponding to the selected points in the 2D plot are highlighted in the study table. In addition, the corresponding molecules are made visible in the models window, and information about the selected molecules is printed in the text window

You can also go the other way, select molecules in the study table and see where they are in the 2D plot:

Select a few rows in the study table (by <Ctrl>- or <Shift>-clicking the row headings). Go back to the Equation Viewer control panel and click the Plot Equation action button. The 2D plot now shows the points corresponding to the selected rows in red, and other points in green.


Saving the QSAR equations

QSAR+ allows you to save the QSAR equations for later retrieval and use.

Open the Save QSAR Equations control panel by clicking the Save Equations pushbutton in the Equation Viewer control panel.

In the Save QSAR equations control panel, set the popup to Current Equations Set, enter appropriate names and comments in the appropriate boxes, and enter testset.qsar in the QSAR equations file entry box. Then click Save.

The entire set of GFA equations is saved in the file t-estset.qsar.

You can read in QSAR equations saved in .qsar files into the Equation Viewer:

Open the Open QSAR equations control panel by clicking the Open Equations pushbutton in the Equation Viewer control panel.

In the Open QSAR equations control panel, set the upper popup to Equations Set and enter an appropriate name for the new set (such as newset). Then use the file browser to find the file you just created, testset.qsar, and select it by highlighting it and clicking SELECT or by double-clicking the filename. Information about the file is displayed in the Open QSAR equations control panel. Now click Open to read the equations into the equation viewer.


Predicting activity of new molecules

Once you have calculated a QSAR equation, it is easy to use it to predict the activity of a molecule outside the training set.

1.   Activity of a new molecule

To calculate the activity of a new molecule, all you need to do is add it to the study table that contains a column representing the QSAR equation and the original descriptors used to generate the equation. We illustrate the procedure by calculating the activity of a copy of one of the molecules used in the training set and confirming that the value calculated by the QSAR equation is the same as for the original molecule.:

Select the File/Load Model menu item in the Cerius2 Visualizer panel and navigate to the same directory you used at the beginning of the session:
Cerius2-Resources/EXAMPLES/DBH

Select the dbh02.msi file and click LOAD to load the molecule into Cerius2. The copy of dbh02 is named dbh02_1.

Make sure that model dbh02_1 is the current model in the Models window and add it to the study table by selecting the Molecules/Add Current menu item in the study table menubar.

The new molecule is added at the bottom of the study table. QSAR+ automatically calculates charges, adds hydrogens, and performs an energy minimization (as for the original molecules). All the descriptors are automatically calculated, including the QSAR equation column (GFA Predicted Activity), which should show a value of 3.081, the same as for the original dbh02 molecule in row 1.

2.   Modifying an existing molecule

QSAR+ allows you to easily inspect the effect of chemical changes in an existing molecule (a molecule already in the study table) on the predicted activity value.

Open the Molecule Preferences control panel by selecting the Preferences/Molecules menu item in the study table menubar.

Check the Recalculate Descriptors When Models are Edited checkbox in the Molecule Preferences control panel.

Open the Sketcher control panel by selecting the View/3D-Sketcher menu item in the Cerius2 Visualizer. Use the sketcher to change the sulfur atom in the dbh02_1 model to an oxygen atom (select the Edit Element icon in the Sketcher control panel and choose O from the pulldown to its right, then click the sulfur atom in dbh02_1).

Immediately after you pick the sulfur atom, QSAR+ checks and corrects the number of hydrogens, recalculates charges, minimizes the molecule, and recalculates the descriptors in the study table corresponding to model dbh02_1. The new value obtained for the predicted activity should be 3.879.


Saving the study

You can save your QSAR study, including molecules and the QSAR study table:

Select the File/Save Session menu item in the Cerius2 Visualizer. Enter a name for the session you want to save (such as qsar_quick.mss) and click the SAVE button.

The Cerius2 session is saved for later retrieval.

3.   Finish up

To end the Cerius2 session, close all open control panels and select File/Exit from the Visualizer menu bar.
If you want to continue using Cerius2, close all control panels and select File/New Session from the Visualizer menu bar.


Summary

This chapter began by describing the general procedure for generating a QSAR equation. The chapter then familiarized you with QSAR+ by illustrating the steps you could perform to build a QSAR equation. As it described each step, the chapter pointed out default settings and processing performed automatically by QSAR+.

As you become more experienced with using QSAR+ to build QSAR equations, you may want to experiment with the menu items in the QSAR card (same functions as in the study table menubar). Doing so enables you to perform tasks in the order that best suits the type of analysis that you want to perform, as well as control the amount of processing performed automatically by QSAR+.

Additional tutorials are provided as appendixes to this documentation set.



MSI Product Previous Next Contents Index Top

Last updated May 18, 2000 at 05:48PM Pacific Daylight Time.
Copyright © 2000, Molecular Simulations Inc. All rights reserved.