Abstract

BackgroundThe function of a novel gene product is typically predicted by transitive assignment of annotation from similar sequences. We describe a novel method, GOtcha, for predicting gene product function by annotation with Gene Ontology (GO) terms. GOtcha predicts GO term associations with term-specific probability (P-score) measures of confidence. Term-specific probabilities are a novel feature of GOtcha and allow the identification of conflicts or uncertainty in annotation.ResultsThe GOtcha method was applied to the recently sequenced genome for Plasmodium falciparum and six other genomes. GOtcha was compared quantitatively for retrieval of assigned GO terms against direct transitive assignment from the highest scoring annotated BLAST search hit (TOPBLAST). GOtcha exploits information deep into the 'twilight zone' of similarity search matches, making use of much information that is otherwise discarded by more simplistic approaches.At a P-score cutoff of 50%, GOtcha provided 60% better recovery of annotation terms and 20% higher selectivity than annotation with TOPBLAST at an E-value cutoff of 10-4.ConclusionsThe GOtcha method is a useful tool for genome annotators. It has identified both errors and omissions in the original Plasmodium falciparum annotation and is being adopted by many other genome sequencing projects.

Highlights

  • The function of a novel gene product is typically predicted by transitive assignment of annotation from similar sequences

  • The GOtcha method is a useful tool for genome annotators

  • In this paper we present a novel method, GOtcha, that can be applied to any database search technique that returns scored matches

Read more

Summary

Introduction

The function of a novel gene product is typically predicted by transitive assignment of annotation from similar sequences. We describe a novel method, GOtcha, for predicting gene product function by annotation with Gene Ontology (GO) terms. In the context of this paper the term function is used to refer to all aspects of a gene product's behaviour This includes the concepts described by the Gene Ontology classifications for Molecular Function, Biological Process and Cellular Component. We have created an empirically based estimate of accuracy (the Pscore, expressed as a percentage) that can be used to indicate confidence in the prediction of association between a GO term and a gene product. A background set of 518226 annotated sequences from the SwissProt gene associations were included in the accuracy estimate after excluding taxa corresponding to the search databases and their subspecies. For GO terms where there are few datapoints with which to estimate accuracy reliably, accuracy estimation falls back to a scoring table that combines results over all GO terms from that ontology with the same number of ancestors

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.