Abstract

BackgroundFunctional annotation of genes and gene products is a major challenge in the post-genomic era. Nowadays, gene function curation is largely based on manual assignment of Gene Ontology (GO) annotations to genes by using published literature. The annotation task is extremely time-consuming, therefore there is an increasing interest in automated tools that can assist human experts.ResultsHere we introduce GOTA, a GO term annotator for biomedical literature. The proposed approach makes use only of information that is readily available from public repositories and it is easily expandable to handle novel sources of information. We assess the classification capabilities of GOTA on a large benchmark set of publications. The overall performances are encouraging in comparison to the state of the art in multi-label classification over large taxonomies. Furthermore, the experimental tests provide some interesting insights into the potential improvement of automated annotation tools.ConclusionsGOTA implements a flexible and expandable model for GO annotation of biomedical literature. The current version of the GOTA tool is freely available at http://gota.apice.unibo.it.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0777-8) contains supplementary material, which is available to authorized users.

Highlights

  • Functional annotation of genes and gene products is a major challenge in the post-genomic era

  • At the time of the retrieval, the Gene Ontology (GO) vocabulary consisted of 39,399 distinct terms partitioned into three main categories, structured as directed acyclic graphs (DAG) with a unique root: 26,099 terms of type Biological Process (BP), 9753 of type Molecular Function (MF) and 3547 of type Cellular Component (CC)

  • Results and discussion performances are assessed over the entire GO hierarchy, without considering separately the three main ontologies BP, MF and CC

Read more

Summary

Introduction

Functional annotation of genes and gene products is a major challenge in the post-genomic era. Gene function curation is largely based on manual assignment of Gene Ontology (GO) annotations to genes by using published literature. The annotation task is extremely time-consuming, there is an increasing interest in automated tools that can assist human experts. GO is the de facto standard for functional annotation of genes [2, 3]. The two main efforts of the GO project involve: i) the development and maintenance of a controlled vocabulary (ontologies) of functional attributes; ii) the annotation of genes in terms of the their associated attributes. At the state of the art, GO annotations derived from manual curation of scientific literature can be still regarded as the gold-standard in terms of quality and specificity. The manual annotation step is extremely time-consuming, and it is one of the major bottlenecks in GO curation.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call