[Top] [Index]

View Hypothesis Workbench


Clustering Hypotheses

The Cluster Hypotheses... command is available under the Tools menu in the View Hypothesis workbench. This command provides a graphical user interface which allows you to cluster hypotheses by selecting them from the shelf and then issuing the command.

The Cluster Hypotheses dialog box opens when you select this command:

Pair-wise root-mean-square fits are performed on all hypotheses pairs and the mapping function F_ij = pairs + 1/(RMS + 1.01), where pairs is the number of blobs common to both hypotheses in a particular mapping, and the default RMS is the minimum root-mean-square displacement between the hypotheses. Catalyst can also be instructed to consider the differences between the feature weights and/or tolerances by setting the .Catalyst parameters compare.considerWeight and compare.considerTolerance to the value of 1. The RMS is then calculated as:

where N is the number of matching pairs, disp is the displacement between the hypotheses, Tolij is the tolerance of the ith feature on the jth hypothesis, Tolik is the tolerance of the corresponding feature on the kth hypothesis, and >Tol is the larger tolerance value of the two features. A similar notation is used for the weights.

All possible mappings between two hypotheses are performed and the F_ij function evaluated. The largest functional value is used for a particular pair of hypotheses. We define a similarity function S_ij = F_ij/sqrt(F_ii * F_jj) and a distance function D_ij = 1 - S_ij. We then feed the functional values of the distance function, D_ij, of all the pairs of hypotheses into a clustering algorithm.

Currently, there are three different clustering methods:

  1. Hierarchical clustering analysis: simple linkage;

  2. Hierarchical clustering analysis: complete linkage; and

  3. Hierarchical clustering analysis: average linkage.

You can select any one of the three clustering methods and then execute the command.

The results of the clustering are presented in ASCII format, in a file, clusterHypoResults_i.txt, written to the current directory and popped up on the Catalyst interface in a window:

Once you see the clustering information, you can decide how to proceed. For example, you could merge the hypotheses (two at a time) that belong in the same cluster using the Merge Hypotheses/Features command also available within the Tools menu in the View Hypothesis workbench.

Hierarchical Cluster Analysis

In Hierarchical Cluster Analysis (HCA), there is a nested family of clusterings, which is organized according to the value of an objective function. It is this objective function that distinguishes the HCA methods.

The nested family of clusterings begins with the set of models clustered into singletons. All model pairs are then examined in turn and the pair which yields the smallest value of the objective function is fused into a new cluster. This process is repeated until all models are fused into one cluster.

Objective functions of the HCA methods are as follows. If a clustering is given, then:

Single/Complete/Average linkage is the smallest distance between clusters, where cluster distance means:

where the distance between model pairs refers to the distance, D_ij, described above.

[Top] [Index]



Last updated April 8, 1998 at 05:43pm PDT.
Copyright © 1999, Molecular Simulations Inc. All rights reserved.