| Science Talk: After the Genome
   
       
 
         
          |  "The Genome, 
              We Are Sure, Is Packed with Subtleties"
Paul Schimmel, Professor, Department of Molecular Biology  
             It's very exciting. None of us know what treasures lie beneath 
              the sequence. 
             There has been a huge capital investment, not only by the government 
              through the National Institutes of Health, the National Cancer Institute, 
              and the National Science Foundation, but also through private foundations, 
              like the American Cancer Society, the Howard Hughes Medical Institute, 
              the Wellcome Trust, [and] private industry, particularly on the 
              entrepreneurial side. 
             All of [these organizations] have large programs trying to understand, 
              at the beginning, the function of the proteins that are encoded 
              by the human genome. Rarely have we seen capital investment coming 
              from so many corners focused on one problem. 
             Over the next 10 years, I believe probably 90 percent of the proteins 
              will have an assigned function. Maybe that's too optimistic, but 
              it's certainly within reach. 
             What is much harder to come to grips with is how [to] put it all 
              together to make an organism. How does this all fit together? There 
              are approaches being used by diverse groups trying to knock out 
              genes and relate them to phenotypes, particularly related to embryonic 
              development and differentiation. 
             The genome, we are sure, is packed with subtletiesthe expression 
              on your face, body language, intuitive faculties, gestures, the 
              things that we do that we don't even think aboutthese are 
              things that we don't understand at all in a detailed sense as they 
              relate to the genome, but more and more we're getting the feeling 
              that [these subtleties] are genetically encoded. They are part of 
              this array that we just don't understand. 
             That's where the advances need to be made. What are these genes? 
              Even if you know the proteins, how do they work to generate a highly 
              sophisticated organism? 
             I think that we will have all the pieces to the jigsaw puzzle figured 
              out ("this must be part of a lake and this must be part of a forest 
              over here, and this must be part of a house over here"). Putting 
              them together to get the whole picture is very difficult. 
             How long that will take is harder. Will it be in the next 100 years? 
              That's a good question. I do believe the end result will be that 
              humans will have a sense of how you go from a puffer fish to a mouse 
              to a humanorganisms with a similar numbers of genes and many 
              of the same genes, but obviously [leading to] very different outcomes. 
              
           |   
          | 
  Functional 
              Analysis and Genetic Diversity in Yeast and Malaria
Elizabeth Winzeler, Assistant Professor, Department of Cell 
              Biology 
             One of the big areas of investigation in the post-genome era will 
              be assigning function to the genes that are predicted in the genome 
              project. One of the techniques that I am most familiar with is expression 
              profiling. As genome sequences become available, it's easy to create 
              arrays [of various nucleotide sequences] that can then interrogate 
              every gene in the genome. Then, by hybridizing the RNA from different 
              tissues or disease states or different stages of an organism's life 
              cycle, you can start determining when a gene is probably transcriptionally 
              active, and that actually gives you quite a bit of information about 
              the potential functional role for that gene. 
             This can really go a long way towards narrowing down the list of 
              potentially interesting targets that you might want to concentrate 
              on if you are involved in the drug discovery process. 
             I started working on post-genome functional analysis in [the budding 
              yeast] Saccharomyces in 1996, right after the genome sequence 
              was released, and I became involved in a number of different projectsdeveloping 
              tools for expression profiling as well as creating knockout strains 
              for every gene in the yeast genome. I'm still doing a little bit 
              of yeast research. For example, we [also] recently used oligonucleotide 
              arrays to map all of the chromosomal origins of DNA replicationthere 
              are about 400 in yeastby isolating DNA fractions that were 
              enriched for origin activity and then hybridizing the fractions 
              to high density oligonucleotide arrays. 
             We have also used oligonucleotide arrays to study genetic diversity 
              in yeast. Usually, only one strain or individual representative 
              from a particular organism is sequenced. By comparing the patterns 
              which result when genomic DNA is hybridized to arrays, we can find 
              out how closely related different strains are. I've looked at 10 
              or 11 different yeast isolates. I think this technology is going 
              to be very interesting to population geneticists in the future. 
              You can get a much more descriptive look at the genome, and you 
              can find regions of the genome that are evolving at faster rates. 
             In the past couple years, I've been working on applying this type 
              of technology to organisms that are more difficult to work with 
              and are more relevant to human health. The malaria parasite has 
              a genome size that is about two times as large as yeast. The sequence 
              has been done for about six months, and the annotations should become 
              available [soon]. The parasite also has both haploid and diploid 
              phases, like Saccharomyces, but has a complex life cycle 
              involving both humans and mosquitoes, is difficult to maintain in 
              culture, and has gene function that cannot be studied using classical 
              forward genetics. 
             Malaria is a major health problem worldwide. There are 300 million 
              cases a year, and there has been a resurgence in the number of cases 
              because of drug resistance. Many inexpensive anti-malarials are 
              no longer effective. 
             While genetic studies are difficult, it's relatively easy to get 
              RNA from all the different stages of the parasite's lifecycle and 
              this offers us new ways to study gene function in the parasite. 
              In the past year, I've designed an oligonucleotide array that contains 
              about 500,000 probes to two different Plasmodium genomes 
              [a mouse strain, and the human strain]. The array we designed at 
              TSRI arrived a month or two ago, and what we are doing now is collecting 
              RNA samples from many different conditions. We're exposing parasites 
              to drugs to identify new genes involved in [resistance] pathways. 
              We're hybridizing genomic DNA in order to characterize genetic diversity 
              in different field isolates and find out how similar or different 
              the isolates are. Eventually, we'd like to take these tools into 
              the field and map the spread of drug resistance. 
             If you start doing longitudinal studies after you introduce a new 
              drug, you might be able to identify the drug targets or the mechanisms 
              of resistance, because we predict we will see pockets of variability 
              developing within the genome over time that are associated with 
              the drug's target. This may lead to new knowledge about the mechanisms 
              of drug resistance. If you can start finding the mutations that 
              are associated with drug resistance, then that tells you how to 
              treat patients in the field. 
           |   
          | 
  "The 
              Main Reason to Sequence The Genome Was to Facilitate Positional 
              Cloning"
Bruce Beutler, Professor, Department of Immunology 
             It will take a very long time to close the phenotype gap. The fact 
              is, there are about 34,000 genes, give or take a few thousand. If 
              you add up all the phenotypes known from mutations in humans and 
              from knockouts in mice, you come up with about 5,000. So something 
              like six out of seven genes don't have an essential function attached 
              to them yet. 
             The way that people go about identifying phenotypes now is to mutate 
              every gene in the genome and keep certain phenotypes of interest 
              to them under surveillance. In this way, in principle, one can find 
              every gene that is required for a particular function. Once you 
              have a phenotype, then comes the problem of finding the particular 
              mutation that caused it. That's done by positional cloning. That's 
              where sequencing the genome has been particularly useful. 
             In fact, the main reason to sequence the genome was to facilitate 
              positional cloning. I think a lot of people don't realize that. 
              It's a rapid way to find the function of genes. 
             In the old days, when you positionally cloned something, you first 
              had to map the mutation. By following meiosis, you would confine 
              the mutation to a point between two markers on the chromosomehopefully 
              a very small area, less than a million base pairs long. Second, 
              you would have to clone all the DNA from end-to-end across that 
              area. Third, you would have to find all the genes that were candidates 
              in that area. And finally, you would have to find the mutation. 
             The sequencing of the genome has made it so that you don't have 
              to do steps two and three anymore. You no longer have to clone all 
              the DNA across the area, because the sequence is known. And you 
              no longer have to look for genes because, in principle, they've 
              all been found and annotated. Now the limiting factor in finding 
              mutations is doing the genetic mapping, and that might take about 
              a year. Then finding the gene, in theory, should be trivial. It 
              used to be that the process of cloning the critical region and identifying 
              candidates would, by themselves, take several years. So things have 
              gotten a lot easier. 
           |   
          | 
  "You 
              Can't Get Too Hung Up On Any One Protein"
Ian Wilson, Professor, Department of Molecular Biology 
               
             The overall plan for the Joint Center for Structural Genomics is 
              to try to produce as many new structures as possible. By "new" we 
              mean ones for which you can't predict the fold from the sequence. 
              However, a lot of these will turn out to be similar structures to 
              others. For example, we have recently worked on a protein that is 
              less than 15 percent identical to anything in the Potein Data Base, 
              and we found out its structure is [almost] identical to another 
              protein. 
             To start off, we've been concentrating on one organism, Themotoga 
              maritima to see how much of it we can clone, express, purify, 
              crystallize, collect synchrotron data, determine the structure, 
              and deposit in the databank. In collaboration with Scott Lesley 
              of GNF, we're trying to see how many proteins from that one organism 
              we can pass through the various steps of the pipeline that are required 
              [for] high-throughput structural genomics. 
             The other organism that we're currently working on is C.elegans. 
              These are likely to be much more difficult proteins to express. 
              They're more complex, but they're more representative of eukaryotic 
              organisms, such as mouse and human [the specific organism]. Here, 
              we are concentrating on proteins that are likely to have novel folds 
              or at least have folds that cannot be predicted at present. 
             For proteins that we are really interested in, we can also look 
              for homologues and orthologues in other organisms. But in structural 
              genomics, you can't get too hung up on any one protein, because 
              it's a numbers game. The goal, which the NIH suggests that we should 
              be able to achieve, is, in year four [of the project], to produce 
              100 to 200 structures per year. That comes down to nearly one every 
              working day. And within four to six weeks from the time we have 
              finished refining the structure, we have to deposit them into the 
              Protein Data Bank. 
             That's what we're working towards and that's what we're trying 
              to achieve. And since everything is deposited in the public domain, 
              that information is accessible to everybody. Thus, the structures 
              produced by structural genomics should enable the work of biologists, 
              molecular biologists, and cell biologists worldwide.
 
 
 |     Go back to News & Views Index 
       |   |