Abstract
One of the rewards of completing, or essentially completing, the genomic sequence of reference organisms is to get an idea of the number of genes it takes to build an organism. The current consensus is that 6142 protein-encoding genes keep the budding yeast Saccharomyces cerevisiae alive and well, while the worm Caenorhabditis elegans has on the order of 19,099 genes. It was somewhat gratifying for Drosophilists to learn this year that the fly might be able to make do in life with fewer genes than the worm does; Adams et al. (2000) predicted 13,601 genes from the sequence of the 120-Mb euchromatic portion of the Drosophila melanogaster genome. The remaining 60 Mb of the fly genome is heterochromatic, and most of it is unclonable. Of the proportion of heterochromatin that is cloned, only small bits have been sequenced, and these fragments cannot be easily aligned because of interruptions by repetitive sequences. However, genetic studies predict that heterochromatin will contribute at least several dozen genes, and perhaps substantially more, to the total gene count (Gatti and Pimpinelli 1992). Hence, the estimate of 13,601 is a conservative one for the gene number in D. melanogaster, but just how conservative it may be is an open question. How many more hundreds or thousands of genes remain to be discovered for Drosophila? How can we best go about the business of finding these genes and deciphering their functions? Yeast researchers have set the gold standard for addressing such questions in functional genomics, because they can delete each of the predicted open reading frames in the yeast genome and examine consequences in vivo (Winzeler et al. 1999). Unfortunately, such approaches cannot be applied comprehensively to organisms that lack efficient methods for gene disruption or to those that have complex genomes and hundreds of cell types to assay for phenotypes. For Drosophila, the gold standard for evaluating gene number and function is provided by the 2.9-Mb Adh region, the most thoroughly understood region of the fly’s genome (Ashburner et al. 1999). Drosophilists dream about having as comprehensive a knowledge of the remaining 98.5% of the genome as Ashburner and colleagues have provided for the Adh region. In reality, such in-depth understanding was a hard-won victory. Extensive genetic and molecular analyses carried out over a span of several decades and annotation efforts carried out over a span of two years account for the high confidence level in the gene estimates for the Adh region (Ashburner 2000). In this issue, Andrews et al. (2000) describe a practical route to gene discovery, one they prove to be useful for Drosophila and one that can be applied to other multicellular organisms. The strategy is based on the analysis of expressed sequence tags (ESTs) from a defined tissue. The use of ESTs for gene discovery is not a novel idea (Adams et al. 1991; Rubin et al. 2000); however, the results of Andrews et al. (2000) are particularly timely and satisfying given the current status of the Drosophila Genome Project. The key to their success was the application of both computational and microarray approaches to characterize the properties of their new collection of ESTs. These approaches allowed them to assess the complexity of the EST collection, its relationship to in vivo expression profiles, and its redundancy with other available EST banks. Once the potential of this EST collection to provide new information was established, Andrews et al. (2000) demonstrated that the unique ESTs provided biological evidence for the existence of hundreds of predicted genes, newly discovered genes, or transcript forms. The success with this analysis led the authors to propose that the gene identification mission for multicellular organisms could advance considerably by taking advantage of tissue differences in gene expression profiles. Thus, a sampling of a relatively modest number of ESTs (approximately several thousand) from many different tissues could identify novel genes much faster than deeper probing of a few general libraries. In addition, as demonstrated here, an extra reward is gained from generating a collection of tissue ESTs, namely, that the ESTs can be used to learn something about the biology of the tissue of interest. As the first step in this study, the authors asked if their tissue source, the adult testis, expressed a sufficiently complex RNA population to be useful for whole-scale EST analysis. Given that Drosophila males produce sperm with enormously long tails, the expression profile of the testis could have been dominated by a small number of transcript types, for example, those encoding structural components of the tail. Corresponding author. E-MAIL wakimoto@u.washington.edu; FAX (206) 543-3041. Article and publication are at www.genome.org/cgi/ doi/10.1101/gr.169400. Insight/Outlook
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have