Abstract

During the last few years, we have seen enormous strides in our abilities to sequence genomes, and the information that has poured out of these sequences is quite astonishing. With more than 150 complete genome sequences now available and many laboratories rushing into microarray analysis, proteomic initiatives, and even systems biology, it seems an appropriate time to consider not just the opportunities those sequences present, but also their shortcomings. By far the most serious problem is the quality and degree of completeness of the annotation of those genomes. Most troublesome are the large numbers of open reading frames that have been identified by computer programs, but remain labeled as a “conserved hypothetical protein” when they occur in more than one genome or simply a “hypothetical protein” when they appear unique to the genome in question. Between them, these two categories of annotated open reading frames often represent more than half of the potential protein-coding regions of a genome. These annotations highlight just one portion of our ignorance about the information content of genomes and our lack of fundamental knowledge about the function of so many of the building blocks of cells. Unless we rectify this situation, it is likely to undermine many of the other “-omic” efforts currently underway. Here I advocate a rather straightforward approach to address this problem—focused initially on the bacterial genomes. In contrast to the numerous proposals for big science initiatives to understand the fundamental workings of biological organisms, I propose a small science, relatively low-tech approach that could have a dramatic pay off. A relatively small investment could yield a massive amount of information that would greatly enhance our current efforts to use genomic approaches to study life.

Highlights

  • During the last few years, we have seen enormous strides in our abilities to sequence genomes, and the information that has poured out of these sequences is quite astonishing

  • I would encourage a consortium of bioinformaticians to produce a list of all of the conserved hypothetical proteins that are found in multiple genomes, to carry out the best possible bioinformatics analysis, and to offer those proteins to the biochemical community as potential targets for research into their function

  • I would make a pitch for including all genes in Mycoplasma genitalium, which, as the free-living organism with the fewest genes, might be the most suitable as a model system for in-depth understanding of its biology

Read more

Summary

Initial Proposal

The initial proposal is directed at deciphering the role of the “hypothetical proteins” encoded in the microbial genomes and would involve a community-wide approach to determine the function of these hypotheticals based on solid, oldfashioned biochemistry. Upon completion of the project and the identification of the function, they would receive a further supplement to that grant as a reward. In this way, one might hope to rally some of the best biochemical talent and apply it to this problem of determining function for a wide range of new proteins. One might hope to rally some of the best biochemical talent and apply it to this problem of determining function for a wide range of new proteins The cost of such an operation could be quite minimal, and the bureaucracy and review process could be simple.

Key Steps
The Importance of Community
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.