| QSAR |

You can use QSAR+ to help make informed decisions about which candidate compounds should be considered (based on estimates of biological activity), as well as to help gain insight into various underlying biological processes. You can also use QSAR+ to provide basic insight into structure-property relationships. You might, for instance, first use other Cerius2 modules to model the atomic-level mechanisms behind quantitative structure-activity relationships. Using the analytical capabilities available in QSAR+, you can then correlate the values calculated from the modeling programs with various properties. This correlation ability makes QSAR+ a useful complement to your molecular modeling programs.
This chapter familiarizes you with QSAR+ by illustrating the procedure for building a QSAR equation using C2·QSAR+.
This section describes the general procedure for generating a QSAR, which consists of the following 11 steps:
Understanding the QSAR generation process
The first step is to choose the molecular structures to use as the training set. QSAR+ provides tools that enable you to build new structures, create a congeneric series of 3D structures, and import chemical structure files in a wide variety of formats.
2. Enter biological activity data
For each of the molecules in the training set, you must provide information about the observed biological activity associated with that molecule.
If you are performing a 3D QSAR analysis, you usually need conformational information, which you usually obtain by performing a conformational search. You can choose from a variety of conformation generation methods.
QSAR+ calculates a wide variety of spatial, electronic, topological, information-content, thermodynamic, conformational, quantum mechanical, and shape descriptors. QSAR+ gives you the ability to modify existing descriptors and the combination of descriptors in a descriptor set. You can create or import new descriptors from other Cerius2 modules such as Molecular Field Analysis (MFA) and Receptor to meet your specific requirements.
You can generate graphs to depict descriptor distribution. If holes exist in descriptor sets, you can choose new compounds to fill the holes. You can also display correlation matrices to assist you in identifying descriptors that are highly correlated and histograms and rune plots to help you examine the uniformity of your data. Descriptive statistics are available to further characterize descriptors. Additionally, you can transform and normalize descriptors, as appropriate. You can also carry out principal components analysis (PCA) and cluster analysis to further characterize your data.
After you identify the appropriate dependent and independent variables, you can choose from several statistical methods for generating a QSAR equation. These include multiple linear regression, partial least squares (PLS), simple linear regression, stepwise multiple linear regression, and principal components regression (PCR). Additionally, if the genetic function approximation (GFA) functionality is installed, you can also perform a genetic analysis, either GFA or G/PLS, to create a QSAR equation.
You can apply validation techniques to identify outliers and leverage points. You can also use graphic analyses and crossvalidation to characterize the robustness of the QSAR.
You can use graphical tools to plot observed vs. predicted activities and to identify outliers. You can also generate 3D plots to visualize the positions of important 3D-QSAR descriptors from Molecular Field Analysis (MFA) or Receptor Surface Analysis (RSA) in relation to the molecules.
You can save calculated QSAR equations in QSAR+ equation databases for later use.
You can now use a calculated QSAR equation to predict biological activity of compounds. In Cerius2·QSAR+, you can simply draw a candidate structure, add it to your study, apply your calculated QSAR equation, and immediately view the predicted activity.
When you are finished with your study, you can save the entire QSAR analysis, including all its component structures and conformations, for later review and use.
Cerius2·QSAR+ does not impose a specific order in which you must perform various tasks. Instead, QSAR+ attempts to meet the needs of users who are both novices at and experienced with the QSAR generation process.
Using QSAR+
All the QSAR+ commands can be accessed from the menu cards in the QSAR card deck. (The menu cards in this deck can vary, depending on the modules you have licensed). More experienced users can select items from these cards and perform tasks in the order that best suits the type of analysis that they want to perform. By selecting items from these cards, more experienced users (as well as those novice users who want to do so) can also change various settings to better meet their needs, as well as control the amount of processing performed automatically by QSAR+.
The software must be ready. That is:
The appropriate Cerius2 software must be installed and running.

1. Load the molecules into Cerius2
|
Starting with a new Cerius2 session, select the File/Load Model menu item in the Cerius2 Visualizer panel and use the file browser to navigate to the directory Cerius2-Resources/EXAMPLES/DBH.
|
|
Select all the .msi models contained in this directory, from dbh02 to dbh52, and click LOAD to load them into Cerius2.
|
A total of 47 models should be loaded and also listed in the Cerius2Model Manager.
|
Select the QSAR deck of cards and select the Show Study Table menu item on the QSAR card. This opens the QSAR Study Table control panel.
|
As each molecule is added, QSAR+ automatically calculates charges, adds hydrogens, and performs an energy minimization. Charge calculation, hydrogen addition, and energy minimization are performed according to the default values or user-specified criteria for performing each of these tasks.These default settings can be modified by selecting the Preferences/Molecules menu item in the study table menubar.
Entering biological activity data
QSAR+ needs biological activity data for each molecule in the training set. In this example, you enter biological activity data for a set of molecules into the study table. You enter biological activity data in the same way that you enter data into any Cerius2 table. That is, you can type the activity data directly into the study table cells or copy the activity data from another table. In this session you enter the data directly into the study table.
|
Use the mouse to select the cell (in the Activity column) in which you want to enter data from Table 1 (below) and then type the data. Typed characters appear both in the cell and in the edit window at the top of the study table (where editing and formatting of data take place). Formatted data is entered into the cell when you press <Enter> or use the mouse to select another table cell.
|
At this point in the process of building a QSAR equation, you have a study table containing both chemical structures and biological activity data for each molecule with which you want to work. Additionally, QSAR+ has (by default) added hydrogens, performed energy minimizations, and calculated charges for each of these molecules. You are now ready to perform the next step and generate a QSAR equation.
Calculating descriptors
A descriptor is any of a number of built-in molecular properties that QSAR+ can calculate and use in determining new QSAR relationships. QSAR+ provides a wide variety of spatial, electronic, topological, information-content, thermodynamic, conformational, quantum mechanical, and shape descriptors. Descriptor data can be imported from other Cerius2 modules, including Receptor and MFA. Groups of descriptors are designated default descriptors for different applications (QSAR, Diversity, QSPR). In this session you will use the default descriptors for the QSAR application.
|
Make sure that the default options corresponding to the QSAR application are set by selecting the Preferences/Default Set/QSAR menu item in the study table menubar.
|
At this point 20 molecular descriptors are added to the study table, and their values are calculated for each molecule present in the table.
Setting dependent and independent variables and exploring the data
Before you generate a QSAR equation you need to specify which columns in the study table should be used as dependent and independent variables.
2. Set the independent variables
You are now ready to generate a QSAR equation. Several regression methods are available in QSAR, including multiple linear regression, partial least squares (PLS), simple linear regression, stepwise multiple linear regression, principal components regression (PCR), genetic function approximation (GFA), and G/PLS. In this session you will use the GFA method.
Generating a QSAR equation
|
Set the Methods popup to GFA. (This popup is the one next to the RUN pushbutton in the study table.) Then click the RUN pushbutton to start the GFA calculation with the default parameters.
|
The GFA calculation take a few minutes. It generates a set of 99 or 100 QSAR equations, which will be downloaded into the Equation Viewer control panel, sorted by the lack-of-fit (LOF) parameter. By default, the first (best) equation is validated using the crossvalidation method.This QSAR equation is automatically inserted as a new column in the study table (GFA Predicted Activity), along with a column showing the residuals (observed - predicted activity values, GFA Residuals Activity). A plot of predicted vs. observed activity values is also displayed in the plot window. Results of the crossvalidation of the QSAR equation are shown in the text window.

|
Make sure the Equation Viewer control panel is visible by selecting the Tools/Equation Viewer menu iteim in the study table menubar.
|
|
Click the More... button in the QSAR Equation section of the Equation Viewer control panel to open the preferences control panel for QSAR equations. Then check the Auto update 2D Plot checkbox.
|
Now, every time you select a QSAR equation in the Equation Viewer control panel, the corresponding predicted vs. observed activity plot is automatically updated.
|
Select the QSAR equation number 1 in the Equation Viewer control panel and click the Plot Equation action button. The 2D plot of predicted vs. observed activity should be updated.
|
The rows corresponding to the selected points in the 2D plot are highlighted in the study table. In addition, the corresponding molecules are made visible in the models window, and information about the selected molecules is printed in the text window
QSAR+ allows you to save the QSAR equations for later retrieval and use.
Saving the QSAR equations
|
Open the Save QSAR Equations control panel by clicking the Save Equations pushbutton in the Equation Viewer control panel.
|
The entire set of GFA equations is saved in the file t-estset.qsar.
|
Open the Open QSAR equations control panel by clicking the Open Equations pushbutton in the Equation Viewer control panel.
|
Once you have calculated a QSAR equation, it is easy to use it to predict the activity of a molecule outside the training set.
Predicting activity of new molecules
|
Select the File/Load Model menu item in the Cerius2 Visualizer panel and navigate to the same directory you used at the beginning of the session:
Cerius2-Resources/EXAMPLES/DBH |
|
Select the dbh02.msi file and click LOAD to load the molecule into Cerius2. The copy of dbh02 is named dbh02_1.
|
|
Make sure that model dbh02_1 is the current model in the Models window and add it to the study table by selecting the Molecules/Add Current menu item in the study table menubar.
|
The new molecule is added at the bottom of the study table. QSAR+ automatically calculates charges, adds hydrogens, and performs an energy minimization (as for the original molecules). All the descriptors are automatically calculated, including the QSAR equation column (GFA Predicted Activity), which should show a value of 3.081, the same as for the original dbh02 molecule in row 1.
|
Open the Molecule Preferences control panel by selecting the Preferences/Molecules menu item in the study table menubar.
|
|
Check the Recalculate Descriptors When Models are Edited checkbox in the Molecule Preferences control panel.
|
Immediately after you pick the sulfur atom, QSAR+ checks and corrects the number of hydrogens, recalculates charges, minimizes the molecule, and recalculates the descriptors in the study table corresponding to model dbh02_1. The new value obtained for the predicted activity should be 3.879.
You can save your QSAR study, including molecules and the QSAR study table: 
Saving the study
|
Select the File/Save Session menu item in the Cerius2 Visualizer. Enter a name for the session you want to save (such as qsar_quick.mss) and click the SAVE button.
|
The Cerius2 session is saved for later retrieval.
This chapter began by describing the general procedure for generating a QSAR equation. The chapter then familiarized you with QSAR+ by illustrating the steps you could perform to build a QSAR equation. As it described each step, the chapter pointed out default settings and processing performed automatically by QSAR+. 
Summary
As you become more experienced with using QSAR+ to build QSAR equations, you may want to experiment with the menu items in the QSAR card (same functions as in the study table menubar). Doing so enables you to perform tasks in the order that best suits the type of analysis that you want to perform, as well as control the amount of processing performed automatically by QSAR+.
Additional tutorials are provided as appendixes to this documentation set.