Abstract

Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.

Highlights

  • The Gene Ontology (GO; http://geneontology.org) is the most widely adopted resource for systematic representation of gene product functions [1,2,3]

  • Every GO term has a human-readable text definition, and a growing number have logical definitions that explicitly refer to terms in GO and other Open Biomedical Ontology (OBO) ontologies [2,3,5,6]. (More formally, logical definitions use equivalence axioms expressed in OWL, the Web Ontology Language [7], to ‘specify necessary and sufficient conditions for class membership’ for an ontology term.) Such definitions facilitate ontology structure maintenance and quality control

  • Pairs of terms from the fission yeast GO slim with co-annotations were evaluated over time and visualized as described in the Methods

Read more

Summary

Introduction

The Gene Ontology (GO; http://geneontology.org) is the most widely adopted resource for systematic representation of gene product functions [1,2,3]. A set of annotations that use the ontology to describe gene products. (More formally, logical definitions use equivalence axioms expressed in OWL, the Web Ontology Language [7], to ‘specify necessary and sufficient conditions for class membership’ for an ontology term.) Such definitions facilitate ontology structure maintenance and quality control. Functional studies use subsets of the ontology (sometimes known as ‘GO slims’), that exclude highly specific terms and take advantage of the fact that annotations are propagated over transitive relations (e.g. is_a, part_of ) in the ontology

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.