STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation

Tobias Wittkop,Nigam H Shah,Corey Powell,Ari E Berman,Uday S Evani,Sean D Mooney,K Mathew Fleisch,Emily Teravest

doi:10.1186/1471-2105-14-53

Abstract

BackgroundGene Ontology (GO) enrichment analysis remains one of the most common methods for hypothesis generation from high throughput datasets. However, we believe that researchers strive to test other hypotheses that fall outside of GO. Here, we developed and evaluated a tool for hypothesis generation from gene or protein lists using ontological concepts present in manually curated text that describes those genes and proteins.ResultsAs a consequence we have developed the method Statistical Tracking of Ontological Phrases (STOP) that expands the realm of testable hypotheses in gene set enrichment analyses by integrating automated annotations of genes to terms from over 200 biomedical ontologies. While not as precise as manually curated terms, we find that the additional enriched concepts have value when coupled with traditional enrichment analyses using curated terms.ConclusionMultiple ontologies have been developed for gene and protein annotation, by using a dataset of both manually curated GO terms and automatically recognized concepts from curated text we can expand the realm of hypotheses that can be discovered. The web application STOP is available at http://mooneygroup.org/stop/.

Highlights

Gene Ontology (GO) enrichment analysis remains one of the most common methods for hypothesis generation from high throughput datasets
Term enrichment analysis, which refers to the search for ontology terms that occur more in a given gene list when compared with a background gene set, can be used to generate new scientific hypotheses
We find that automated annotations generated in this manner reliably recover the known annotations already present in the text record, and we find that we are able to annotate with a wide spectrum of concepts not available in any currently used ontology enrichment tools

Summary

Introduction

Gene Ontology (GO) enrichment analysis remains one of the most common methods for hypothesis generation from high throughput datasets. We developed and evaluated a tool for hypothesis generation from gene or protein lists using ontological concepts present in manually curated text that describes those genes and proteins. High throughput experimentation such as gene expression microarrays, generation sequencing or proteomics enables the interrogation of many thousands, or even millions, of data points simultaneously. Comparison between these experiments (such as a phenotype and control) enables identification of gene or protein sets of interest in a hypothesis free manner. We believe in a hybrid approach of testing manually curated terms along with automatically recognized concepts from curated text will result in more hypotheses and be more useful to the researcher

Methods

Results

Discussion

Conclusion