| QSAR |

To meet your unique requirements for building QSAR equations, QSAR+ enables you to work with descriptors in a variety of ways.
You manage descriptors by using the Descriptors control panel (described later in this chapter). Descriptor management includes activities such as identifying the descriptors with which you want to work, displaying and selecting only descriptors in a specific class, specifying preferences for the different descriptors, and adding descriptors to the study table.
Editing the descriptor database
When QSAR+ is installed, you can access a descriptor database that contains the equations used to calculate molecular descriptors. You can edit this database to modify the supplied descriptors, create new descriptors, specify which descriptors should be considered default descriptors, create new descriptor categories, and control the format in which the results of descriptor calculations are displayed in the study table.
The following activities related to working with descriptors are included in this chapter:

You can see the descriptors in each set by choosing Descriptors/Databases from the study table menu bar. This opens the Descriptor Database control panel, which contains a list of descriptors.
The message at the top of the Descriptor Database control panel identifies the current defaults set.
QSAR defaults descriptor set
This section provides information about the following activities related to managing descriptors:
Managing descriptors
Selecting descriptors
Descriptors are selected using the Descriptors control panel. To access the Descriptors control panel, select the Descriptors/Select menu item in the study table.
The control panel contains a list of the descriptors in the current descriptors database. These may be selected by clicking column one, that is, clicking EPenalty causes row 1 of the descriptor table to become highlighted, which means it will be added to the study table (see the next section for details). To unselect a descriptor, click any part of the table other than the first column, so that the highlight is turned off.
The control panel contains four controls that allow you to select groups of descriptors. The left popup controls whether the action is to select, deselect, or display the selected descriptors. For example, if you want to select all the conformational descriptors, you can do so by choosing Select in the left popup and then setting the Descriptors in Family popup to Conformational. Now when you click the (unlabeled) action button (below ADD), the conformational descriptors are selected. To deselect them, change the Display popup to Deselect, then click the action button again.
If you find the display of all the descriptors at the same time distracting, you can display just the selected descriptors by setting the Select/Deselect popup to Display.
Another way to select a subset of descriptors is to use the All/Default popup. To see the effect of this control, set the Descriptors in Family popup to Electronic, select Default from the All/Default popup, then click the action button.
This highlights the default electronic descriptors. To select all the electronic descriptors, set the All/Default popup to All, then click the action button.
Setting descriptors preferences
You may have noticed that selecting certain families of descriptors causes the Preferences button to become active and to change its name. When the Descriptors in Family popup is set to Electronic, for example, the Preferences button is labelled Electronic.
When you click this newly active button, a control panel appears, which allows you to customize certain aspects of the way the electronic descriptors are calculated.
If you decide that only the total dipole moment is needed, uncheck the XYZ Components checkbox. Now only the total dipole moment (calculated from atomic partial charges) is added to the study table. Preferences for the calculation of other types of descriptor may be set in the same way.
Daylight descriptors preferences
The maximum error levels allowed in the Daylight calculation of ClogP and CMR are customizable through the Daylight Descriptors control panel. Options are also provided to add the error level values to the study table as separate columns. Open this control panel by setting the family popup in the Descriptors control panel to Daylight and then selecting the Daylight button.
Information-content descriptor preferences
This descriptor relates to the atomic composition of molecules.
The first checkbox in the Information Content Descriptors control panel sets the information of atomic composition index, created by partitioning the atoms of the molecule into equivalence classes based on their atomic numbers.
If Edge-based is checked, the four buttons below apply to information indices based on the edge adjacency and edge distance matrices, specifically,
For a detailed explanation of this descriptor, see Chapter 4, Theory: QSAR+ Descriptors.
Receptor descriptor preferences
The receptor descriptor preferences come in two control panels: Receptor-Model Interactions and RSA (Receptor Surface Analysis) Preferences.
An explanation of each of the above is given in Chapter 4, Theory: QSAR+ Descriptors.
You cannot add receptor descriptors to the study table until you have specified a receptor surface model. For information on this, see Using receptor surface analysis descriptor on page 144.
Receptor surface analysis (RSA) preferences
The RSA control panel is similar to 1, 2, and 3 above, except that instead of the sum of each type of interaction being added to the study table, the interaction at each point on the receptor surface is added. This is typically several thousand columns of data and to limit this a filtering control is provided: Filter Surface Points, which lets you select a subset of the points in terms of Every Nth Surface Point, Variance, and Correlation.
Spatial preferences
The Spatial Descriptors control panel controls the calculation of spatial descriptors such as the moment of inertia about the principal axes of a molecule. The calculation is controlled by checkboxes. For example, if you want the magnitude of the moment of inertia, but not its Cartesian components, then uncheck the XYZ Components checkbox. See the Principal moment of inertia (PMI) section, page 84, for a theoretical explanation of the principal moment of inertia descriptor.
Jurs charged partial surface area parameters
The definition of polar atoms and the probe radius for the solvent-accessible surface area calculation can be customized with the Spatial Descriptors control panel. Open this control panel by setting the family popup in the Descriptors control panel to Spatial and then selecting the Spatial button.
For an explanation of the shadow indices see the Shadow indices section on page 80.
Defining hydrogen-bond acceptors and donors and rotatable bonds
The definitions of hydrogen-bond acceptors, hydrogen-bond donors, and rotatable bonds can be customized with the Structural Descriptors control panel.
Open this control panel by setting the family popup in the Descriptors control panel to Structural and then selecting the Structural button.
Thermodynamic descriptors preferences
AlogP98 descriptors
The 115 atom types defined in the calculation of AlogP98 are now available as descriptors. To calculate them, select the entry AlogP_atypes in the Thermodynamic family in the descriptor table. Each AlogP98 atom-type value represents the number of atoms of that type in the molecule. An additional atom type called Unkown_Type can also be added to the table, together with the other AlogP98 atom types. A value greater than zero for this descriptor indicates the presence of atoms that couldn't be classified as any of the defined AlogP98 atom types. The AlogP Atom Types control panel allows you to select the elements to be taken into account.
Open this control panel by setting the family popup in the Descriptors control panel to Thermodynamic and then selecting the Thermodynamic button.
Topological descriptors preferences
For an explanation of the topological descriptors see the discussion of graph-theoretical (page 56) and information-content descriptors (page 74).
Adding descriptors to the study table
When you have selected the set of descriptors that you want to use, you may add them to the study table by clicking the ADD button in the top left corner of the Descriptors control panel.
Using ISIS keys and Daylight fingerprints
ISIS keys
To work with ISIS keys, select Descriptors/Fingerprints/Isis Keys from the study table to open the 2D Fingerprints Isis Keys control panel. With this control panel, you can:

The second control panel (RSA_Preferences) controls the addition of interaction energies at each vertex of the surface. You may add only the VDW (steric) component of the interaction energy, or only the electrostatic part (ELE), or both (TOT), by clicking the VDW, ELE, and TOT buttons appropriately. A column is created in the study table for each point on the receptor surface model, containing the energy of interaction at that point between the surface and the molecule. For a large receptor surface model, this can be several thousands of columns if all points are added to the study table: too many for some of the statistical methods available. You can reduce the number of points added to the study table with the Filter Surface Points control.
Three methods are available, based on selecting every nth surface point, variance, or correlation.
1. Selection of every nth surface point.
b. Add every nth surface point
2. Selection by variance
c. Add points with variance higher than threshold
d. Add percentage of points with highest variance
3. Selection by correlation
e. Add points with correlation higher than threshold
f. Add percentage of points with highest Correlation^2
Next, click the button on the extreme left side (underneath the ADD button). This displays the receptor descriptors Receptor_energies and Receptor_RSA. To select the Receptor_RSA descriptor, click the cell containing the label Receptor_RSA. To add the receptor surface data to the study table, then click the ADD button. The receptor surface points are added to the study table.
These points may be displayed with the Manage Independent Columns control panel, which is accessedd by selecting the Variables/Manage Independent menu item in the Study Table. Set the 3D-QSAR popup to RSA and click the Label Independent Variables button.
Surface points in the study table should be displayed on the receptor surface model as a text label, for example, TOT/123. The first part of the label refers to the type of energy term specified in the RSA Preferences control panel, as Include Molecule-Surface Point Interaction Energies. The second part is the number of the surface point and is the same index as the Surface point index in the first column of the output of the Receptor List function.
Typically, the next stage is to calculate a QSAR that relates the receptor surface energies at each surface point to experimental activity data. For a guide to calculating QSARs, see Chapter 14, Using the Equation Viewer, and Chapter 2, QSAR+ QuickStart.

Using pKa descriptors
Installing pKa
For the pKa program to be found by Cerius2, it must be listed in the applcomm.db file in $C2DIR/libraries/applcomm.db. The form of the entry is:
A unix pKa pathnamewhere pathname is replaced by the pathname of your pKa application.
Adding pKa descriptors to the study table
The pKa descriptors are included in the QSAR, COMBICHEM, and QSPR descriptor databases. The three steps to adding pKa descriptors to the study table are:
1. Open the appropriate descriptor database
2. Set the pKa descriptor preferences:
3. Add the pKa descriptors to the study table
A count of pKa columns begins with the string n_pKa_. This is followed by the range of values being counted. For example, n_pKa_0.00_14.00 is a count of pKas with values between 0.00 and 14.00.
A list of pKa columns begins with the string pKa_. The first number tells which pKa value among the selected pKas is held in this column. The second number gives the maximum number of pKas to be listed. The third number specifies whether the pKas are being listed from low to high (number = 0) or from high to low (number = 1), The fourth number specifies whether a range (number = 0) or a lower (number = 1) or upper (number = 2) bound is being used to select the pKas to list. If a range is used, it is followed by two numbers specifying the range. If a lower or upper bound is used, it is followed by the number specifying the bounds. For example, pKa_1_2_0_2_14.00 is the lowest pKa of a maximum of two pKas under the bound of 14.00.
A descriptor database is a Cerius2 table that contains the equations and equation coefficients used to calculate molecular descriptors. When QSAR+ is installed, you can access a database that contains over 100 spatial, electronic, thermodynamic, conformational, and shape descriptors.
Editing a descriptor database
You can modify an existing descriptor database or create a new one by editing the installed descriptor database provided in QSAR+, then saving the modified descriptor database under a new name.
This section describes the following activities related to editing a descriptor database:
Because the descriptor database is accessed as a Cerius2 table, you should be familiar with Cerius2 tables before performing any activities described in this section. For information about tables and basic table operations, see the Modeling Environment manual.
Opening a descriptor database
You must select and open a descriptor database in a descriptor database table before you can edit it. The default database name is listed in the text window when you open QSAR+.
To open a descriptor d-atabase
If you have only a single database or if you want to use the currently selected database, select Descriptors/Databases on the QSAR menu card. The QSAR application starts, and the Descriptor Database control panel appears.
If you have more than one descriptor database and want to change the selected database:
You can change the contents of the set of default descriptors by editing the Default column.
Adding a descriptor to the default set
To add a descriptor to the default set:
1. Select a cell in the Default column.
2. Clear the edit window and enter 1.
3. Press <Return> or click any other cell in the table.
1. Select a cell in the Default column.
2. Press <Return> or clear the edit window and enter 0.
3. Click in any other cell in the table.
1. Insert a new row in the descriptor database table using the
Insert icon.
2. In the Family column, enter a family name.
3. Enter a descriptor equation in the Value column using valid math and molecular operators.
4. In the Description column, enter a short description of the descriptor.
5. In the 3D column, enter 0 if your descriptor is not a 3D descriptor. Enter 1 if the descriptor is 3D.
6. In the Default column, enter 1 if you want the descriptor to be
part of the default set. Enter 0 if the descriptor is not to be a
default descriptor (Identifying default descriptors on page 151).
7. In the Format column, enter the format for descriptor values to
be displayed in the Study Table.
8. In the Decimal column, enter the number of decimal places to be displayed in a descriptor value. If you entered integer in the Format column, enter 0.
12. Enter a name (for example, Halogens) in the Row Name entry box.
13. Click Apply To. The row name is entered in the first column of
the selected row.
14. Save the database containing the new descriptor. You can save the descriptor to the current database, to another existing database, or to a new database. For more information, Saving a descriptor database on page 155.
|
|
To activate a new descriptor, you must first save the descriptor database with the descriptor in it.
|
When you finish creating a descriptor, you can check to see that it is correctly entered by adding it to the study table and inspecting the generated data (see Adding descriptors to the study table on page 143).
Modifying descriptors
You can modify an existing descriptor in a database by editing the entry for the descriptor in the Value column of the descriptor database table. For example, to modify the Halogens descriptor defined in the last section so that it counts fluorine as well as chlorine and bromine atoms, enter:
Save the database to activate the edited descriptor (see Saving a descriptor database on page 155).
When you finish modifying a descriptor, you can check to see that the modifications are correct by adding it to the study table and inspecting the generated data (see Adding descriptors to the study table on page 143).
Controlling the descriptor display format
You can control the numerical format of a descriptor value using one of the following options: floating decimal (float), integer (integer), or scientific notation (scientific).
To change the descriptor display format, edit the entry displayed in the Format column of the descriptor database table to the option of your choice.
Creating new descriptor categories
The entry in the Family column of the descriptor database table categorizes descriptors and determines the list of choices in the Descriptors in family popup in the Descriptors control panel.
Creating a new descriptor family
You can create new categories of descriptors by placing new entries in the Family column. For example, if investigator Jones wants to place all saved equations in a category named Jones-QSARs, Jones simply enters this designation in the Family column for the rows containing QSARs and saves the modified table. The value Jones-QSARs now appears as a choice in the Descriptor Set popup list on the Select Descriptors control panel.
Saving a descriptor database
Note that if you make a change in the descriptor database table, that change is not activated until the table is saved and then read back into Cerius2 again with OPEN DATABASE.
If you want to save the database that is displayed to the current database file, go to the QSAR menu card, select Descriptors/Databases. This opens the Descriptor Database control panel. When you click the SAVE DATABASE button, the Save Database control panel appears, which lets you choose a name for your new or modified database.
1. The Save Descriptor Database control panel appears.