[Top] [Index]

Generate Hypothesis Workbench


Contents:

Descriptions of menu items used in hypothesis generation:

Workbench | Data | Edit | View | Style | Tools | Windows

Overview of hypothesis generation

Catalyst allows you to use structure and activity data for a set of lead compounds to create a hypothesis characterizing the activity of the lead set. A good hypothesis can increase your understanding of the activity of your lead compounds and can be useful for evaluating other similar compounds, while providing insight into new avenues of drug discovery. A successful hypothesis can help you understand the relative importance of different features of a set of leads and provide ideas for future research.

With a hypothesis you can predict the activities of other compounds having the same receptor binding mechanism and score how well the hypothesis explains the activities of each molecule in your training set. You can display and evaluate in three dimensions how the significant features of your lead compounds fit your hypothesis.

Catalyst includes two primary methods for creating hypotheses. The first method is to interactively build a hypothesis by one of several procedures in the View Hypothesis workbench, based upon your knowledge of the significant chemical functions and fragments constituting your target compound. These procedures are explained under To build a hypothesis.

The second method is for Catalyst to automatically generate a hypothesis from a diverse set of lead compounds that have activity data from the same assay. This section describes the automatic tools and the procedures you use in the Generate Hypothesis workbench.

You can run automatic hypothesis generation as a batch job in the background on your computer or another computer on the network. This allows you to use Catalyst for other activities while the background job is running. If you run a background job on a computer that is also being used interactively, memory and processing power must be shared, so operations within the interactive Catalyst session will be slower. If you run the background job on another computer that is not being used for an interactive session, the interactive Catalyst session is not affected.

You can request that Catalyst consider up to five types of functions from the Feature Dictionary for generating a hypothesis. The eleven predefined chemical functions for automatically generating hypotheses are the same ones used in the View Hypothesis workbench for constructing hypotheses:

Please see Description of Catalyst's predefined chemical functions.

You can modify the definitions of the predefined chemical functions and use them in generating hypotheses. However, database searching is much slower when a hypothesis includes user-defined functions.

In general, the best hypotheses have five features, but only two or three types of features. If possible, you should use what you know about the characteristics of your lead-compound set to reduce the number of types of features considered during hypothesis generation.

Automatic generation of hypotheses includes three phases:

  1. Prepare your set of lead compounds and generate conformers.
  2. Prepare your input spreadsheet by entering the lead compounds and their activity data.
  3. Set up a background process for hypothesis generation.

Preparation for hypothesis generation

For a general description of how to use hypotheses, see Introduction to hypotheses and constraints. Once you understand how you can use hypotheses, you can prepare to generate a hypothesis automatically from a training set of compounds, as follows:

  1. Create a lab. Use the Create Lab menu item in the Stockroom to create a place to store your training set. Give the lab a unique name. See Stockroom and labs for more information.
  2. Get training set compounds into Catalyst. Use one of these methods:
  3. Save each of the lead compounds in the lab you created.
  4. Check functions and ionization state. The standard functions listed in the Feature Dictionary and used by default for generating hypotheses assume a pH of 7.0. Use the Show Function Mapping menu item in View Hypothesis on each training set molecule to make sure that each function you intend to use maps the training set molecules properly and that the ionization state is appropriate. If the function is not identified properly, you can edit the function definition in the View Hypothesis workbench (see To build a hypothesis).
    If a lead molecule has a different ionization state in its active form, you can create the ionized molecule by editing in the View Compound workbench (see Ionizing and deionizing atoms).
  5. Display appearance. Use the Tools/2D Beautify and Tools/3D Minimize menu items to correct the display of each molecule. Be sure to save any changes if you want them to be permanent.
  6. Structure and stereochemistry. Check that the 3D structure and stereochemistry are correct for each compound before proceeding to the next step. If the molecule is distorted or has incorrect stereochemistry, make the necessary corrections. Use the Set Stereochemistry tool in the View Compound toolbox (see How to use the toolbox). Save each modified compound to the shelf.
  7. Generate a conformational model. Use the Tools/Generate Conformational Model menu item of the View Compound workbench. You should use Best conformer-generation mode, which for 80 conformers could take approximately 30 min for compounds of about 80 atoms. The conformer-generation program automatically creates the number of conformers needed to cover the conformational space of the molecule (see Introduction to conformer generation).
    Upon completing conformer generation in Catalyst, the new conformers become part of the compound and can be viewed and evaluated with the Tools/Show Conformational Model menu item. If conformers are generated in a background process, they can be brought into Catalyst using the Data/Process Information menu item.
  8. Now you are ready to set up your Generate Hypothesis workbench (see Introduction to the tabular report/spreadsheet).

To set up and generate a hypothesis automatically

After preparing your lead set of molecules, generating a conformational model for each one (see Preparation for hypothesis generation), and putting activity data into your spreadsheet, you choose the functions to be used and then set up a background process to generate a hypothesis:

  1. In the Generate Hypothesis workbench, check that your spreadsheet lists all compounds that you want to use for generation and that their tested activities and uncertainties are correct. This input spreadsheet should be saved to the shelf.
    The compounds in your lead set do not need to be on the shelf in the Generate Hypothesis workbench, but they must be in a lab or the Stockroom.
  2. Select (from the shelf) the saved spreadsheet containing the lead set compounds and their activity data.
  3. Select the Tools/Generate Hypothesis menu item. A control panel appears.
  4. Enter the name you want to give to the generated hypothesis in the Output Hypothesis entry box.
    The next step is to choose the set of chemical functions that you want to be considered during hypothesis generation. Use the Feature Selection portion of the control panel (shown below) to choose a maximum of five types. control panel
  5. If you know which types of functions are likely to be a significant part of your hypothesis, select them one at a time from the Dictionary list box and then click the Add button.
    The hypothesis-generation program will analyze various combinations of function types (within the maximum and minimum limits set) and choose the combination of functions that best accounts for the structures and activities in the lead set. The time required to run the generation program increases substantially as the types of functions increases, so choose as few function types as possible that can still provide a good description.
    One way of identifying the "important" functions is to drag the most active training set compound into the View Hypothesis workbench and use the Show Function Mapping menu item to indicate what functions are represented in the molecule. This can be used as an indicator of which functions may be important in contributing to the activity of this and other molecules (see the Show Function Mapping menu item description).
    Recommendations for choosing functions for the hypothesis. Choose the NEG CHARGE function if you have full negative charges in your training set molecules and the NEG IONIZABLE function if you have protonated acidic functions in your training set, but not both types. In a similar manner, use either POS CHARGE or POS IONIZABLE function, but not both. Also, use either HB ACCEPTOR or HB ACCEPTOR (lipid) function, but not both.
    As you select each function, its name appears in the Selected Function Definitions list box. Next to the name are default values for the minimum and maximum number of instances of the function allowed in the hypothesis.
  6. If you want to delete a function from the selection list, select its name on the Selected Function Definitions list and then click Remove.
  7. To change the default maximum or minimum number of instances for a function that you want to include in the hypothesis, select the Edit button.
    The Feature Editor control panel appears, with the present values of Minimum count and Maximum count.
    A minimum count of 1 means that you want at least one of a particular function. You can limit the number of a given type in the generated hypothesis. For example, you could specify a maximum count of 2 for HB DONOR functions.
    (Generally the defaults are appropriate for the initial hypothesis.)
    Limit on location constraints in hypothesis generation. It is important to understand that you may have up to five features in a generated hypothesis and that each feature must have at least one location constraint. But the maximum number of location constraints in a generated hypothesis is seven. In addition, to properly characterize vector features such as hydrogen-bond donors and acceptors, they must have a location constraint at both the heavy atom location and projected point location. Therefore, if you have two vector features with a total of four location constraints, you can have only three other nonvector features to stay within the limit of seven location constraints.
    Be aware also that the simplest generated hypotheses are 1) a null hypothesis, which consists of an average activity estimate and no functions, and 2) a hypothesis with four location constraints (two vector functions or four nonvector functions, for example).
  8. Specify the Total Features in the hypothesis by entering the minimum (Min) and maximum (Max) limits. The hypothesis generated will not have fewer than the Min value (range 1-5) nor more than the Max value (range 1-5).
  9. Hypothesis generation has the following parameters that can be reviewed and modified after selecting the More Hypothesis Options button:
  10. The next part of the setup process is to choose the computer on which to run the background task and the time to start, using the Job Options part of the control panel:
    control panel
    Keep in mind that a simple hypothesis-generation run can take more than 12 hours on an Indigo-class computer. If you generate a hypothesis in the background on a computer while it is being used interactively, the demands of hypothesis generation will noticeably slow other operations.
    Choose to run on your computer by selecting the Locally button. Or choose to run on a different computer on the network by selecting the Remotely on button. The names of the available computers on the network are listed in the list box.
    When you select a host computer, its name is highlighted on the list and displayed in the Remote Host text box.
  11. Enter values for Start Time, Queue After, Process Name, and Local Directory, as described in below:
  12. When the parameters are correctly entered, select the OK button. If you want to close the control panel without doing anything, select the Cancel button.
  13. Recheck that all parameters are correctly set in the Generate Hypothesis control panel, then select the Generate button. The necessary directory and files are set up to run the HypoGen program. During the setup time, even if you sent the job to a remote computer, the Catalyst interface is grayed out and the cursor is displayed as a clock symbol. When the setup process successfully completes, an Alert message informs you that the setup is completed and the process will be started at the requested time. (See When a background process starts for additional information.)
    To check the status of the background process and manage the data generated, see the Process Information menu item.
  14. HypoGen calculates several cost parameters that can be used to estimate the likelihood of generating valid hypothesis models. These cost values are generated within the first 15 min of the job and can be viewed in the *.log and *.full files found in the run directory.

Quitting Catalyst when a background process is scheduled or running

After setting up a background process, you can exit Catalyst and even log off your computer. That is, terminating your Catalyst session or logging out does not terminate the background process, nor does the background process require you to be logged on or to have an active Catalyst session for it to run as scheduled.


Items in the Generate Hypothesis workbench

The Generate Hypothesis workbench provides tools that can generate and analyze hypotheses and compare them with sets of lead compounds. The functions of the parts of the workbench are described here.

Shelf

The shelf in the Generate Hypothesis workbench holds hypotheses, compounds, and spreadsheets needed in hypothesis-generation operations.

Menu bar

The Generate Hypothesis workbench's menu bar provides functions appropriate for generating and analyzing hypotheses:

Status area

The status area of the Generate Hypothesis workbench provides brief reports on the status and results of operations in the workbench, including reports on the number of entries in the tabular report/spreadsheet.

3D workspace

The 3D workspace in the Generate Hypothesis workbench is a display area for viewing and analyzing compounds and hypotheses. You can also do some editing of hypotheses in the workspace.

Toolbox

The toolbox of the Generate Hypothesis workbench provides tools relevant to working with hypotheses: Tether, Measure, Erase, Fit to Window, Tile Objects.

QuickTool area

The QuickTool area of the Generate Hypothesis workbench enables you to use the View Hypothesis QuickTool. See Introduction to the QuickTools.

Entry boxes

The Edit, Set Activity, and Current Hypothesis entry boxes are located between the 3D workspace and the tabular report, as is the Set Activity button. They are used as follows:

Edit entry box

Use the Edit entry box for entering and modifying data in the tabular report/spreadsheet, as follows:

  1. To modify data cells in tabular reports, click the data cell that you want to change. Its current value is shown in the Edit entry box.
  2. Click in the Edit entry box. The cursor changes from a carat to a blinking vertical bar.
  3. Delete the displayed data in the Edit entry box and type in the new data.
  4. Press the <Enter> key on the keyboard to enter the new data in the cell.

Set Activity entry box and button

The Set Activity controls allow you to select and use a different activity property for which you have training-set data. By default, the value in the Activ box is used for Score, Regress, and Generate Hypothesis actions. To use a different value for an activity or a different name:

  1. First add the new activity property to your Property Dictionary, using the Databases/Edit Property Dictionary menu item in the Stockroom.
  2. Then, to change the activity property to be used, click the Set Activity button in the Generate Hypothesis workbench.
    The Set Activity Property control panel appears.
  3. Click the name of the new activity property in the list box.
  4. Select the Set button. The control panel is closed and the name of the new activity property is shown in the text box as the current activity property.

Current Hypothesis

The Current Hypothesis entry box displays the name of the hypothesis to be used in fit operations, if one has been selected. The current hypothesis selection is used by the Tools/Show Selected Compounds/Mappings menu item, which can also be called by double-clicking one of the rows in the spreadsheet. (Other actions that require a hypothesis selection look for a selected hypothesis on the shelf.)

To select a current hypothesis, drag a saved hypothesis and drop it into the entry box. Its name is displayed, and it now can be used for quickly performing fits to compounds in your spreadsheet.

To remove a current hypothesis designation, use the Edit/Clear Current Hypothesis menu item.

Tabular report/spreadsheet

The unique features of the Generate Hypothesis workbench are related to its tabular report and spreadsheet format for input and output of compound and hypothesis information:

control panel

The data cells are arranged in columns in the Generate Hypothesis tabular report/spreadsheet. Some columns contain fixed cells, which means they always display the same property and you cannot modify them (such as Row and Name).

Some columns contain variable cells, which you can change to display a different property (such as Activ, Uncert, and Mol Wt). To display a different property in a variable-cell column, double-click the column header. If the cell is variable, a Change Report Property control panel appears. Select the new property to be displayed in the column from the list of properties and then click the Change button. The control panel is closed and the new property name appears in the column header in the tabular report.

The default cells in the tabular report are defined and used as follows:

  1. Row. Number of the row in the report (automatically entered with compound).To select a row, click the row number.
    To display the compound in the 3D workspace, double-click the appropriate row number (if no current hypothesis is selected). If a current hypothesis is selected (its name appears in the Current Hypothesis text box), double-click the row number to perform a quick Compare/Fit on the compound with the current hypothesis. See Comparing a compound and a hypothesis.
  2. Name. Compound name (automatically entered when the compound is dragged into the report).
  3. Activ. The biological activity of the compound (automatically entered from StockroomDB, if available, or typed in as input data).
  4. Uncert. The uncertainty in the value of the compound activity, a ratio of the reported value to the minimum and maximum values (automatically entered as default of 3.00).
  5. Color. Unique color applied to the compound for identification when displayed during fits with hypotheses in 3D workspace (automatically assigned when the compound is entered, but the color can be changed by clicking a color cell and using the color wheel control panel).
  6. Estimate. The estimated activity of the compound based upon the generated hypothesis (output from program).
  7. Error. A measure of how accurately the hypothesis estimates the activity of the compound. Computed as the ratio of the tested activity to the activity estimated by the hypothesis or the inverse if Est is greater than Activ (output from program).
  8. Mol Wt. Molecular weight of compound (automatically computed and entered).
  9. Mol Formula. Chemical formula of compound (automatically computed).

How to use the workbench

To open a Generate Hypothesis workbench

To open a Generate Hypothesis workbench, click the Generate Hypothesis button in the toolbar.

Setting up for and generating hypotheses

Using Hypotheses

Other Tasks


[Top] [Index]
Last updated April 2000.
Copyright © 1997-2000 Molecular Simulations, Inc. All rights reserved.