Evaluation of BioCreAtIvE assessment of task 2

Christian Blaschke,Alfonso Valencia,Martin Krallinger,Eduardo Andres Leon

doi:10.1186/1471-2105-6-s1-s16

Christian Blaschke, Alfonso Valencia + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-6-s1-s16

Copy DOI

Journal: BMC Bioinformatics	Publication Date: May 1, 2005
Citations: 140	License type: cc-by

Affiliation: Centro Nacional de Biotecnología

Abstract

BackgroundMolecular Biology accumulated substantial amounts of data concerning functions of genes and proteins. Information relating to functional descriptions is generally extracted manually from textual data and stored in biological databases to build up annotations for large collections of gene products. Those annotation databases are crucial for the interpretation of large scale analysis approaches using bioinformatics or experimental techniques. Due to the growing accumulation of functional descriptions in biomedical literature the need for text mining tools to facilitate the extraction of such annotations is urgent. In order to make text mining tools useable in real world scenarios, for instance to assist database curators during annotation of protein function, comparisons and evaluations of different approaches on full text articles are needed.ResultsThe Critical Assessment for Information Extraction in Biology (BioCreAtIvE) contest consists of a community wide competition aiming to evaluate different strategies for text mining tools, as applied to biomedical literature. We report on task two which addressed the automatic extraction and assignment of Gene Ontology (GO) annotations of human proteins, using full text articles. The predictions of task 2 are based on triplets of protein – GO term – article passage. The annotation-relevant text passages were returned by the participants and evaluated by expert curators of the GO annotation (GOA) team at the European Institute of Bioinformatics (EBI). Each participant could submit up to three results for each sub-task comprising task 2. In total more than 15,000 individual results were provided by the participants. The curators evaluated in addition to the annotation itself, whether the protein and the GO term were correctly predicted and traceable through the submitted text fragment.ConclusionConcepts provided by GO are currently the most extended set of terms used for annotating gene products, thus they were explored to assess how effectively text mining tools are able to extract those annotations automatically. Although the obtained results are promising, they are still far from reaching the required performance demanded by real world applications. Among the principal difficulties encountered to address the proposed task, were the complex nature of the GO terms and protein names (the large range of variants which are used to express proteins and especially GO terms in free text), and the lack of a standard training set. A range of very different strategies were used to tackle this task. The dataset generated in line with the BioCreative challenge is publicly available and will allow new possibilities for training information extraction methods in the domain of molecular biology.

Highlights

Molecular Biology accumulated substantial amounts of data concerning functions of genes and proteins
Among those initiatives are the Critical Assessment of Microarray Data Analysis (CAMDA) contest to analyze the performance of microarray bioinformatics tools [1] and the Critical Assessment of PRediction of Interactions (CAPRI) contest for the assessment of protein interaction prediction techniques [2]
The dataset produced at the BioCreative contest task two is freely available from: http://www.pdg.cnb.uam.es/ BioLINK/BioCreative.eval.html[18] and is given in as an XML-like format

Summary

Introduction

Molecular Biology accumulated substantial amounts of data concerning functions of genes and proteins. Information relating to functional descriptions is generally extracted manually from textual data and stored in biological databases to build up annotations for large collections of gene products. Those annotation databases are crucial for the interpretation of large scale analysis approaches using bioinformatics or experimental techniques. CASP has been running for a decade and had served as a model for later initiatives Among those initiatives are the Critical Assessment of Microarray Data Analysis (CAMDA) contest to analyze the performance of microarray bioinformatics tools [1] and the Critical Assessment of PRediction of Interactions (CAPRI) contest for the assessment of protein interaction prediction techniques [2]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluation of BioCreAtIvE assessment of task 2

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Global analysis of gene function in yeast by quantitative phenotypic profiling
James A Brown ... Nicola M Burrows
Molecular Systems Biology | VOL. 2
James A Brown, et. al.James A Brown ... Nicola M Burrows
01 Jan 2006
Molecular Systems Biology | VOL. 2

NewGOA: Predicting New GO Annotations of Proteins by Bi-Random Walks on a Hybrid Graph.
Guoxian Yu ... Yingwen Zhao
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 15
Guoxian Yu, et. al.Guoxian Yu ... Yingwen Zhao
15 Jun 2017
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 15

BC4GO: a full-text corpus for the BioCreative IV GO task.
K Van Auken ... H.-M Muller
Database | VOL. 2014
K Van Auken, et. al.K Van Auken ... H.-M Muller
28 Jul 2014
Database | VOL. 2014

Improving automatic GO annotation with semantic similarity
Bishnu Sarker ... Marie-Dominique Devignes
BMC Bioinformatics | VOL. 23
Bishnu Sarker, et. al.Bishnu Sarker ... Marie-Dominique Devignes
12 Dec 2022
BMC Bioinformatics | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of BioCreAtIvE assessment of task 2

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics