Abstract

Complete functional annotation of genomes is a powerful tool for researchers; however, such annotation is a time-consuming task limited by the availability of experimental data. The function of genes for which there is no experimental data can often be predicted via comparison to related, annotated sequences of known function. We describe here the Reference Genome project, an effort from the Gene Ontology (GO) Consortium to fully annotate twelve genomes to rigorous standards: human, plus eleven organisms that are important models in biomedical research, including mouse, fly, zebrafish, yeast and E. coli. To achieve this, we examine existing experimentally based annotations in a phylogenetic context in order to infer the function(s) of ancestral proteins and propagate these annotations to their descendants. This endeavor faces many difficult challenges, such as: the determination and provision of reference protein sets for each genome; the identification of gene families for curation; the application of consistent best practices for annotation; the development of methodologies for evaluating progress towards our goal; and the development of software tools to support this effort. Annotated genomes are greatly valuable to the research community and will provide the basis for using sequence similarity to annotate further genomes. An overview of the project as well as links to all resources described below can be found at "http://geneontology.org/GO.refgenome.shtml":http://geneontology.org/GO.refgenome.shtmlThis work is supported by NHGRI grant HG002273 (Gene Ontology Consortium) and NIGMS GM081084-01A1 (Phylogenetic tree building and annotation software development). Pascale Gaudet for the Reference Genome Group of the Gene Ontology Consortium. The Reference Genome project is overseen by Pascale Gaudet (dictyBase), Rex Chisholm (dictyBase) and includes these representatives from the curatorial staff: Tanya Berardini (TAIR), Emily Dimmer (GOA), Stacia R. Engel (SGD), Petra Fey (dictyBase), David P. Hill (MGI), Doug Howe (ZFIN), Jim Hu (EcoliWiki), Rachael Huntley (GOA), Varsha K. Khodiyar (UCL), Ranjana Kishore (WormBase), Donghui Li (TAIR), Ruth C. Lovering (UCL), Fiona McCarthy (AgBase), Li Ni (MGI), Victoria Petri (RGD), Deborah A. Siegel (EcoliWiki), Susan Tweedie (FlyBase), Kimberly Van Auken (WormBase), and Valerie Wood (GeneDB)—as well as the following computational staff representatives: Siddhartha Basu (dictyBase), Seth Carbon (BBOP), Mary Dolan (MGI), and Christopher J. Mungall (BBOP)—those establishing the protein families to be annotated: Kara Dolinski (PPOD), Michael S. Livstone (PPOD), and Paul Thomas (PANTHER)—and, the four PIs of the GO Consortium: Michael Ashburner (FlyBase), Judith A. Blake (MGI), J. Michael Cherry (SGD), and Suzanna E. Lewis (BBOP).

Highlights

  • The functional annotation of gene products, both proteins and RNAs, is a major endeavor that requires a judicious mix of manual analysis and computational tools

  • The Gene Ontology (GO) was developed within the community of the Model Organism Databases (MODs), whose goal is to annotate the genomes of organisms having important impact on biomedical research [3,4]

  • GO terms are manually associated with gene products by curators using two general methods: extracting annotations based on published experimental data; and inferring annotations based on homology with related gene products for which experimental data is available

Read more

Summary

Introduction

Background The functional annotation of gene products, both proteins and RNAs, is a major endeavor that requires a judicious mix of manual analysis and computational tools. GO is one of the most widely used tools for functional annotation, in the analysis of data from high throughput experiments. Automated methods that are based on either sequence similarity or domain composition are used to make annotations without curator intervention. These different methods of assigning GO terms to gene products are distinguished by the use of different GO evidence codes [5]. The comprehensive annotation of a genome entails assigning functions to all gene products, including those that have not yet been experimentally characterized

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call