Abstract

Background:Individual researchers are struggling to keep up with the accelerating emergence of high-throughput biological data, and to extract information that relates to their specific questions. Integration of accumulated evidence should permit researchers to form fewer - and more accurate - hypotheses for further study through experimentation.Results:Here a method previously used to predict Gene Ontology (GO) terms for Saccharomyces cerevisiae (Tian et al.: Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function. Genome Biol 2008, 9(Suppl 1):S7) is applied to predict GO terms and phenotypes for 21,603 Mus musculus genes, using a diverse collection of integrated data sources (including expression, interaction, and sequence-based data). This combined 'guilt-by-profiling' and 'guilt-by-association' approach optimizes the combination of two inference methodologies. Predictions at all levels of confidence are evaluated by examining genes not used in training, and top predictions are examined manually using available literature and knowledge base resources.Conclusion:We assigned a confidence score to each gene/term combination. The results provided high prediction performance, with nearly every GO term achieving greater than 40% precision at 1% recall. Among the 36 novel predictions for GO terms and 40 for phenotypes that were studied manually, >80% and >40%, respectively, were identified as accurate. We also illustrate that a combination of 'guilt-by-profiling' and 'guilt-by-association' outperforms either approach alone in their application to M. musculus.

Highlights

  • With the ever-increasing collection of high-throughput experimental techniques, data acquisition at the genomic scale has never occurred more rapidly

  • We present results from one of the nine modeling approaches submitted for the MouseFunc project, producing an updated set of genome-wide annotation predictions for approximately 3,000 Gene Ontology (GO) terms [16] for M. musculus

  • Expanded and more recent sets of mammalian phenotype and GO term annotations for the same mouse genes were acquired from Mouse Genome Informatics (MGI) [17]

Read more

Summary

Introduction

With the ever-increasing collection of high-throughput experimental techniques, data acquisition at the genomic scale has never occurred more rapidly. Comprehensive annotation systems are of paramount importance, as evidenced by the integration of a large number of data types in many model organism databases [1,2,3,4] Such databases are the researcher's starting point for informed hypothesis generation, making the daunting task of curating the source data for representation in model organism databases crucial to effective science. Recognizing this problem, curation systems are becoming increasingly reliant on computational approaches to assist in the annotation process. Integration of accumulated evidence should permit researchers to form fewer - and more accurate - hypotheses for further study through experimentation

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.