Automatic, context-specific generation of Gene Ontology slims

Melissa J Davis,Mark A Ragan,Muhammad Shoaib B Sehgal

doi:10.1186/1471-2105-11-498

Melissa J Davis, Mark A Ragan + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-11-498

Copy DOI

Abstract

BackgroundThe use of ontologies to control vocabulary and structure annotation has added value to genome-scale data, and contributed to the capture and re-use of knowledge across research domains. Gene Ontology (GO) is widely used to capture detailed expert knowledge in genomic-scale datasets and as a consequence has grown to contain many terms, making it unwieldy for many applications. To increase its ease of manipulation and efficiency of use, subsets called GO slims are often created by collapsing terms upward into more general, high-level terms relevant to a particular context. Creation of a GO slim currently requires manipulation and editing of GO by an expert (or community) familiar with both the ontology and the biological context. Decisions about which terms to include are necessarily subjective, and the creation process itself and subsequent curation are time-consuming and largely manual.ResultsHere we present an objective framework for generating customised ontology slims for specific annotated datasets, exploiting information latent in the structure of the ontology graph and in the annotation data. This framework combines ontology engineering approaches, and a data-driven algorithm that draws on graph and information theory. We illustrate this method by application to GO, generating GO slims at different information thresholds, characterising their depth of semantics and demonstrating the resulting gains in statistical power.ConclusionsOur GO slim creation pipeline is available for use in conjunction with any GO-annotated dataset, and creates dataset-specific, objectively defined slims. This method is fast and scalable for application to other biomedical ontologies.

Highlights

The use of ontologies to control vocabulary and structure annotation has added value to genomescale data, and contributed to the capture and re-use of knowledge across research domains
In this paper we introduce a general approach, based on ontology management principles, graph theory and information theory, for the automated generation of ontology slims based on information obtained from both annotations and the ontology structure, and we illustrate the application of this method to the generation of high-quality Gene Ontology (GO) slims at a series of information content thresholds
GO slim for yeast Here we analyse a set of GO slims generated across a range of information content thresholds on the yeast GO annotation contained in the Saccharomyces Genome Database (SGD) database [27], and compare them with the manually created yeast GO slim maintained by the yeast community

Summary

Introduction

The use of ontologies to control vocabulary and structure annotation has added value to genomescale data, and contributed to the capture and re-use of knowledge across research domains. The Gene Ontology Consortium, which is responsible for the ongoing development of GO, draws its members from a number of organism-specific databases including FlyBase [2], Mouse Genome Database [3], WormBase [4], the Arabidopsis Information Resource [5], and the Zebrafish Information Network [6] These consortium members, and others such as the Gene Ontology Annotation Database [7,8], produce GO annotations for public use. This community structure has contributed to the broad acceptance and adoption of GO as the primary controlled vocabulary for molecular genetics and genomics

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Oct 7, 2010
Citations: 69	License type: cc-by

R Discovery Prime

R Discovery Prime

Automatic, context-specific generation of Gene Ontology slims

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

GOgetter: A pipeline for summarizing and visualizing GO slim annotations for plant genetic data.
Emily B Sessa ... Jessie A Pelosi
Applications in plant sciences | VOL. 11
Emily B Sessa, et. al.Emily B Sessa ... Jessie A Pelosi
01 Jul 2023
Applications in plant sciences | VOL. 11

The Neural/Immune Gene Ontology: clipping the Gene Ontology for neurological and immunological systems
Nophar Geifman ... Eitan Rubin
BMC Bioinformatics | VOL. 11
Nophar Geifman, et. al.Nophar Geifman ... Eitan Rubin
12 Sep 2010
BMC Bioinformatics | VOL. 11

Age distribution patterns of human gene families: divergent for Gene Ontology categories and concordant between different subcellular localizations
Gangbiao Liu ... Qiqun Cheng
Molecular Genetics and Genomics | VOL. 289
Gangbiao Liu, et. al.Gangbiao Liu ... Qiqun Cheng
10 Dec 2013
Molecular Genetics and Genomics | VOL. 289

Taxonomy-based partitioning of the Gene Ontology
Wacław Kuśnierczyk
Journal of Biomedical Informatics | VOL. 41
Wacław KuśnierczykWacław Kuśnierczyk
29 Aug 2007
Journal of Biomedical Informatics | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic, context-specific generation of Gene Ontology slims

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics