View Hypothesis WorkbenchShape searching can have many different meanings. Catalyst shape searching is shape similarity searching. That is, a shape query defines a volume in space that is the union of a set of spheres. During a search, each candidate molecule is converted to a similar volume. The two volumes are aligned, and their intersection and union is approximated. The similarity of the shape query to the molecule is computed as the volume of the intersection divided by the volume of the union of the aligned shapes. It should be clear that this is not shape containment searching. Depending on the similarity tolerance that is set, a hit may have more or less volume that protrudes from the shape. The result of this fuzziness is a very robust form of searching. When using a mixed query (see "Aligning a shape to features within a query"), excluded volumes may be added to the query to eliminate hits with protrusions in undesirable locations.
To construct a shape query from a molecule you must first drag and drop the compound icon onto the 3D graphics area. Click the Convert Molecule to Shape button. The displayed molecule will be replaced with the newly created shape query. The Shape Tolerances dialog box will appear. Enter your desired values for the shape tolerances and click OK. Select Save To Lab As... from the Data menu to save the shape query.
You can import a set of aligned molecules as a shape query by choosing Import SD File As Shape Query... from the Data menu. The .sd file is assumed to contain aligned molecules. The coordinates of the atoms are taken as the centers of the shape's spheres, and the atom's element type is used to determine the radius of the corresponding sphere. The resulting shape query is the union of the shapes of the aligned molecules.
With the mapping displayed, select the Convert Molecule to Shape
command to convert the displayed conformer into a shape query.
A shape in a mixed query is processed somewhat differently from a shape-only query. With a shape-only query, the shape is aligned to each candidate compound, the similarity is computed, and if the similarity tolerance is satisfied, the candidate is considered a hit. The shape in a mixed query is also aligned to the candidate compound, and the tolerance is tested. In addition, the transformation used to align the location constraints to the molecule is applied to the shape. Then, the similarity of the shape to the molecule is computed without any further alignment. If this similarity value satisfies the tolerance, then we record a hit. A hit from a mixed query search can be interpreted, therefore, as a molecule which (1) satisfies the feature-only query, (2) has similarity to the shape greater than the specified tolerance, and (3) satisfies the shape similarity tolerance when the orientation of the shape is fixed relative to the location constraints.
You can select a shape query by single-clicking on it and selecting the Constraints/Set Shape Query Tolerances... command. This opens the following dialog box:
The same dialog box is also opened if you click the Convert Molecule to Shape button or if you select a shape query and click the caliper icon tool button.
The Minimum Percent Extent and Maximum Percent Extent are used during 4D filtering to screen out molecules that have extents much larger or smaller than the query shape. This percentage is represented as a fractional number between 0 and 1. Similarly, the Percent Box Volume Match is used during 4D filtering to eliminate molecules which have volume much larger or smaller than the query shape. The primary effect of adjusting these tolerances is a possible change in search speed when searching large databases.
The Similarity Tolerance is the primary parameter that controls what hits are returned from shape searching. This tolerance is the final criterion for determining a hit in a shape search. The similarity is computed as the volume of the intersection of two aligned shapes divided by the volume of their union. Thus, the maximum similarity value is 1.
The Grid Resolution and Bit Volume Padding parameters control how the similarity value is approximated. The volumes of the query shape and molecule shape are represented by marking points that lie within each of the shapes on a three dimensional grid. The spacing of the grid is determined by the Grid Resolution parameter and is measured in Angstroms. To allow for different sized shapes to be compared, the grid is constructed to be larger than the extents of the query shape by the amount of the Bit Volume Padding parameter. The unit of this parameter is also Angstroms.