[Top] [Index] 

View Hypothesis Workbench


Integrating shape into a hypothesis

Shape searches

A Catalyst shape is defined by a set of three-dimensional coordinates, each with a corresponding radius.  Typically, these points are the centers of atoms in a query molecule and the radius associated with each point corresponds to the atom type.  Shapes can be constructed automatically from a single conformer of a molecule or from a set of aligned molecules in a SDfile.  A shape query can be merged with a hypothesis to create a combined query.

Shape searching can have many different meanings.  Catalyst shape searching is shape similarity searching.  That is, a shape query defines a volume in space that is the union of a set of spheres.   During a search, each candidate molecule is converted to a similar volume.  The two volumes are aligned, and their intersection and union is approximated.  The similarity of the shape query to the molecule is computed as the volume of the intersection divided by the volume of the union of the aligned shapes.  It should be clear that this is not shape containment searching.  Depending on the similarity tolerance that is set, a hit may have more or less volume that protrudes from the shape.  The result of this fuzziness is a very robust form of searching.  When using a mixed query (see "Aligning a shape to features within a query"), excluded volumes may be added to the query to eliminate hits with protrusions in undesirable locations.

Creating a shape query from a molecule or set of aligned molecules

The Convert Molecule to Shape command, accessible from the View Hypothesis Workbench, allows you to convert molecules to shape queries.

[Convert Shape button on View Hypothesis Workbench]

To construct a shape query from a molecule you must first drag and drop the compound icon onto the 3D graphics area.  Click the Convert Molecule to Shape button.  The displayed molecule will be replaced with the newly created shape query.  The Shape Tolerances dialog box will appear.  Enter your desired values for the shape tolerances and click OK.  Select Save To Lab As... from the Data menu to save the shape query.

You can import a set of aligned molecules as a shape query by choosing Import SD File As Shape Query... from the Data menu.  The .sd file is assumed to contain aligned molecules. The coordinates of the atoms are taken as the centers of the shape's spheres, and the atom's element type is used to determine the radius of the corresponding sphere. The resulting shape query is the union of the shapes of the aligned molecules.

Aligning a shape to features within a query

To create a mixed query (containing both shape and 3D features with location constraints):

A shape in a mixed query is processed somewhat differently from a shape-only query.  With a shape-only query, the shape is aligned to each candidate compound, the similarity is computed, and if the similarity tolerance is satisfied, the candidate is considered a hit.  The shape in a mixed query is also aligned to the candidate compound, and the tolerance is tested.  In addition, the transformation used to align the location constraints to the molecule is applied to the shape.  Then, the similarity of the shape to the molecule is computed without any further alignment.  If this similarity value satisfies the tolerance, then we record a hit.  A hit from a mixed query search can be interpreted, therefore, as a molecule which (1) satisfies the feature-only query, (2) has similarity to the shape greater than the specified tolerance, and (3) satisfies the shape similarity tolerance when the orientation of the shape is fixed relative to the location constraints.

Performing shape searches directly on spreadsheets

You have the ability to search spreadsheets directly in Catalyst. Filtering through the 4D index works similarly to the 2D and 3D filtering. To conduct a shape search: Catalyst makes no distinction between shape queries and other types of queries. A shape object in a query is treated simply as another type of feature to be mapped. Thus, spreadsheets, StockroomDBs, and BigDBs can all be shape-searched. Filtering is applied appropriately when a 4D index is available.

Setting the tolerance of the shape query

The Set Shape Query Tolerances... command is available from the Constraints menu within the View Hypothesis workbench. This command provides a graphical user interface which allows you to set the tolerances of a shape query.

You can select a shape query by single-clicking on it and selecting the Constraints/Set Shape Query Tolerances... command. This opens the following dialog box:

The same dialog box is also opened if you click the Convert Molecule to Shape button or if you select a shape query and click the caliper icon tool button.

The Minimum Percent Extent and Maximum Percent Extent are used during 4D filtering to screen out molecules that have extents much larger or smaller than the query shape.  This percentage is represented as a fractional number between 0 and 1.  Similarly, the Percent Box Volume Match is used during 4D filtering to eliminate molecules which have volume much larger or smaller than the query shape.  The primary effect of adjusting these tolerances is a possible change in search speed when searching large databases.

The Similarity Tolerance is the primary parameter that controls what hits are returned from shape searching.  This tolerance is the final criterion for determining a hit in a shape search.  The similarity is computed as the volume of the intersection of two aligned shapes divided by the volume of their union.  Thus, the maximum similarity value is 1.

The Grid Resolution and Bit Volume Padding parameters control how the similarity value is approximated.  The volumes of the query shape and molecule shape are represented by marking points that lie within each of the shapes on a three dimensional grid.  The spacing of the grid is determined by the Grid Resolution parameter and is measured in Angstroms.  To allow for different sized shapes to be compared, the grid is constructed to be larger than the extents of the query shape by the amount of the Bit Volume Padding parameter.  The unit of this parameter is also Angstroms.

[Top] [Index] 


Last updated August 12, 1998 at 02:48pm PDT.

Copyright © 1999, Molecular Simulations Inc. All rights reserved.