After the Genome
Genome, We Are Sure, Is Packed with Subtleties"
Paul Schimmel, Professor, Department of Molecular
It's very exciting. None of us know what treasures
lie beneath the sequence.
There has been a huge capital investment, not only
by the government through the National Institutes of
Health, the National Cancer Institute, and the National
Science Foundation, but also through private foundations,
like the American Cancer Society, the Howard Hughes
Medical Institute, the Wellcome Trust, [and] private
industry, particularly on the entrepreneurial side.
All of [these organizations] have large programs trying
to understand, at the beginning, the function of the
proteins that are encoded by the human genome. Rarely
have we seen capital investment coming from so many
corners focused on one problem.
Over the next 10 years, I believe probably 90 percent
of the proteins will have an assigned function. Maybe
that's too optimistic, but it's certainly within reach.
What is much harder to come to grips with is how [to]
put it all together to make an organism. How does this
all fit together? There are approaches being used by
diverse groups trying to knock out genes and relate
them to phenotypes, particularly related to embryonic
development and differentiation.
The genome, we are sure, is packed with subtletiesthe
expression on your face, body language, intuitive faculties,
gestures, the things that we do that we don't even think
aboutthese are things that we don't understand
at all in a detailed sense as they relate to the genome,
but more and more we're getting the feeling that [these
subtleties] are genetically encoded. They are part of
this array that we just don't understand.
That's where the advances need to be made. What are
these genes? Even if you know the proteins, how do they
work to generate a highly sophisticated organism?
I think that we will have all the pieces to the jigsaw
puzzle figured out ("this must be part of a lake and
this must be part of a forest over here, and this must
be part of a house over here"). Putting them together
to get the whole picture is very difficult.
How long that will take is harder. Will it be in the
next 100 years? That's a good question. I do believe
the end result will be that humans will have a sense
of how you go from a puffer fish to a mouse to a humanorganisms
with a similar numbers of genes and many of the same
genes, but obviously [leading to] very different outcomes.
Analysis and Genetic Diversity in Yeast and Malaria
Elizabeth Winzeler, Assistant Professor, Department
of Cell Biology
One of the big areas of investigation in the post-genome
era will be assigning function to the genes that are
predicted in the genome project. One of the techniques
that I am most familiar with is expression profiling.
As genome sequences become available, it's easy to create
arrays [of various nucleotide sequences] that can then
interrogate every gene in the genome. Then, by hybridizing
the RNA from different tissues or disease states or
different stages of an organism's life cycle, you can
start determining when a gene is probably transcriptionally
active, and that actually gives you quite a bit of information
about the potential functional role for that gene.
This can really go a long way towards narrowing down
the list of potentially interesting targets that you
might want to concentrate on if you are involved in
the drug discovery process.
I started working on post-genome functional analysis
in [the budding yeast] Saccharomyces in 1996,
right after the genome sequence was released, and I
became involved in a number of different projectsdeveloping
tools for expression profiling as well as creating knockout
strains for every gene in the yeast genome. I'm still
doing a little bit of yeast research. For example, we
[also] recently used oligonucleotide arrays to map all
of the chromosomal origins of DNA replicationthere
are about 400 in yeastby isolating DNA fractions
that were enriched for origin activity and then hybridizing
the fractions to high density oligonucleotide arrays.
We have also used oligonucleotide arrays to study genetic
diversity in yeast. Usually, only one strain or individual
representative from a particular organism is sequenced.
By comparing the patterns which result when genomic
DNA is hybridized to arrays, we can find out how closely
related different strains are. I've looked at 10 or
11 different yeast isolates. I think this technology
is going to be very interesting to population geneticists
in the future. You can get a much more descriptive look
at the genome, and you can find regions of the genome
that are evolving at faster rates.
In the past couple years, I've been working on applying
this type of technology to organisms that are more difficult
to work with and are more relevant to human health.
The malaria parasite has a genome size that is about
two times as large as yeast. The sequence has been done
for about six months, and the annotations should become
available [soon]. The parasite also has both haploid
and diploid phases, like Saccharomyces, but has
a complex life cycle involving both humans and mosquitoes,
is difficult to maintain in culture, and has gene function
that cannot be studied using classical forward genetics.
Malaria is a major health problem worldwide. There
are 300 million cases a year, and there has been a resurgence
in the number of cases because of drug resistance. Many
inexpensive anti-malarials are no longer effective.
While genetic studies are difficult, it's relatively
easy to get RNA from all the different stages of the
parasite's lifecycle and this offers us new ways to
study gene function in the parasite. In the past year,
I've designed an oligonucleotide array that contains
about 500,000 probes to two different Plasmodium
genomes [a mouse strain, and the human strain].
The array we designed at TSRI arrived a month or two
ago, and what we are doing now is collecting RNA samples
from many different conditions. We're exposing parasites
to drugs to identify new genes involved in [resistance]
pathways. We're hybridizing genomic DNA in order to
characterize genetic diversity in different field isolates
and find out how similar or different the isolates are.
Eventually, we'd like to take these tools into the field
and map the spread of drug resistance.
If you start doing longitudinal studies after you introduce
a new drug, you might be able to identify the drug targets
or the mechanisms of resistance, because we predict
we will see pockets of variability developing within
the genome over time that are associated with the drug's
target. This may lead to new knowledge about the mechanisms
of drug resistance. If you can start finding the mutations
that are associated with drug resistance, then that
tells you how to treat patients in the field.
Main Reason to Sequence The Genome Was to Facilitate
Bruce Beutler, Professor, Department of Immunology
It will take a very long time to close the phenotype
gap. The fact is, there are about 34,000 genes, give
or take a few thousand. If you add up all the phenotypes
known from mutations in humans and from knockouts in
mice, you come up with about 5,000. So something like
six out of seven genes don't have an essential function
attached to them yet.
The way that people go about identifying phenotypes
now is to mutate every gene in the genome and keep certain
phenotypes of interest to them under surveillance. In
this way, in principle, one can find every gene that
is required for a particular function. Once you have
a phenotype, then comes the problem of finding the particular
mutation that caused it. That's done by positional cloning.
That's where sequencing the genome has been particularly
In fact, the main reason to sequence the genome was
to facilitate positional cloning. I think a lot of people
don't realize that. It's a rapid way to find the function
In the old days, when you positionally cloned something,
you first had to map the mutation. By following meiosis,
you would confine the mutation to a point between two
markers on the chromosomehopefully a very small
area, less than a million base pairs long. Second, you
would have to clone all the DNA from end-to-end across
that area. Third, you would have to find all the genes
that were candidates in that area. And finally, you
would have to find the mutation.
The sequencing of the genome has made it so that you
don't have to do steps two and three anymore. You no
longer have to clone all the DNA across the area, because
the sequence is known. And you no longer have to look
for genes because, in principle, they've all been found
and annotated. Now the limiting factor in finding mutations
is doing the genetic mapping, and that might take about
a year. Then finding the gene, in theory, should be
trivial. It used to be that the process of cloning the
critical region and identifying candidates would, by
themselves, take several years. So things have gotten
a lot easier.
Can't Get Too Hung Up On Any One Protein"
Ian Wilson, Professor, Department of Molecular
The overall plan for the Joint Center for Structural
Genomics is to try to produce as many new structures
as possible. By "new" we mean ones for which you can't
predict the fold from the sequence. However, a lot of
these will turn out to be similar structures to others.
For example, we have recently worked on a protein that
is less than 15 percent identical to anything in the
Potein Data Base, and we found out its structure is
[almost] identical to another protein.
To start off, we've been concentrating on one organism,
Themotoga maritima to see how much of it we can
clone, express, purify, crystallize, collect synchrotron
data, determine the structure, and deposit in the databank.
In collaboration with Scott Lesley of GNF, we're trying
to see how many proteins from that one organism we can
pass through the various steps of the pipeline that
are required [for] high-throughput structural genomics.
The other organism that we're currently working on
is C.elegans. These are likely to be much more
difficult proteins to express. They're more complex,
but they're more representative of eukaryotic organisms,
such as mouse and human [the specific organism]. Here,
we are concentrating on proteins that are likely to
have novel folds or at least have folds that cannot
be predicted at present.
For proteins that we are really interested in, we can
also look for homologues and orthologues in other organisms.
But in structural genomics, you can't get too hung up
on any one protein, because it's a numbers game. The
goal, which the NIH suggests that we should be able
to achieve, is, in year four [of the project], to produce
100 to 200 structures per year. That comes down to nearly
one every working day. And within four to six weeks
from the time we have finished refining the structure,
we have to deposit them into the Protein Data Bank.
That's what we're working towards and that's what we're
trying to achieve. And since everything is deposited
in the public domain, that information is accessible
to everybody. Thus, the structures produced by structural
genomics should enable the work of biologists, molecular
biologists, and cell biologists worldwide.