(page 2 of 2)
Into the Twilight Zone
In addition to working on ways to organize the annotated
human genome, Abagyan works actively to contribute to the
annotation itself, using homology modeling and docking.
Homology modeling has traditionally been a tool for determining
which functional family a gene or a protein belongs to by
comparing a sequence from an unknown protein or gene to a
database of known entities.
Obviously, no two genes will be exactly alike, but we could
try to predict if the unknown protein would adopt a similar
fold as a known one.
Occasionally, a conserved active site is enough to identify
an unknown genes function. In fact, HIV-1 protease was
identified shortly after the sequence of the HIV genome was
published because its gene contained a known aspartic proteolytic
But even without an obvious active site, two nearly identical
sequences of DNA from two different organisms would definitely
code for proteins of similar function and near identical foldthe
differences would be in the conformation of the side chains
and the loops in the peptide backbone.
The problem arrives in a form referred to as the "twilight
zone." Typically, scientists employ an arbitrary cutoff, which
means that any genes that are similar to a certain degree,
say thirty percent, will be treated as homologous. Conversely,
any two genes with less than this cutoff will be ignored.
But the sequence similarity disappears long before the structural
or functional similarities do, and two genes that have only
fifteen to thirty percent identity may code for proteins that
have the same function, even though they would be missed by
a homology search. These false negatives are said to be in
the twilight zone.
"One of the goals is to be able to see in the twilight zone,"
says Abagyan. He works on new procedures to align sequences
involving large gaps, dissimilar fragments in the middle of
an alignment, and iterative chains of sequence comparisons.
He proposed the "multilink recognition" algorithm in 1996
and used it to recognize remote similarities.
These indicators can then be given to biochemists, who will
then determine whether the function of the enzyme is assumed
correctly. Furthermore, recognized similarities may indicate
similar folds and similar crystallization conditions, information
that can be given to structural biologists to speed up their
Model Building and Protein-Ligand Docking
Another tool Abagyan uses to annotate the genome is docking.
You take a protein, he says, and you ask,
What small molecule binds to it? and Can
I design a small molecule that will inhibit it?
The basis for docking ligands to certain receptors comes
from knowledge of the atomic structures of the receptors themselves,
which is acquired through biophysical techniques, such as
x-ray crystallography and nuclear magnetic resonance. Experimentally
determined structures may not be necessary for each receptor,
though. "Very often the active site is close enough to the
homologue that binding studies can be done without having
a complete structure," says Abagyan. "It gives you something
to work with."
Given the structure, a model by homology, or a presumed
binding site model of the target, one tries to insert any
number of ligands to the binding site, perhaps scoring them
according to how well they fit in that site.
This is all done computationally, with the molecular structures
of the molecules being represented in a three-dimensional
coordinate system where all the parts of the two molecules
can interact with each other electrostatically, sterically,
hydrophobically, and through hydrogen bond formation to search
possible conformations in order to find the global free energy
minimumthe so-called best fit. The best fitting ligand
is the one that makes the most favorable interactions with
the binding site.
Applying this basic technique, Abagyan generally subjects
target receptors to hundreds of thousands of commercially
available compounds. The flexible docking procedure samples
hundreds of possible conformations of the ligand in the surface
pockets of the receptor and assigns a score to the ligand.
The score is used to rank and order the entire chemical collection.
The end result will be several dozen or hundred of virtual
inhibitorslead compounds that can then be taken into
the laboratory used as inhibitors, or scaffolds to create
The most difficult part, says Abagyan, is following up on
these computational techniques with experimental onesstructural
studies, synthesis of lead candidates, molecular binding assays,
cell-based assays, and so forth.
"You cant do this alone," he says.
1 | 2 |