BioCreative-IV virtual issue.

C N Arighi,Z Lu,K B Cohen,M Krallinger,T C Wiegers,A Valencia,L Hirschman,J W Wilbur,C H Wu

doi:10.1093/database/bau039

Abstract

BioCreative: Critical Assessment of Information Extraction in Biology is an international community-wide effort for evaluating text mining (TM) and information extraction systems applied to the biological domain (http://www.biocreative.org/).The Challenge Evaluations and the accompanying BioCreative Workshops bring together the TM and biology communities to drive the development of practically relevant TM systems. One of the main goals of this initiative is that the resulting systems facilitate a more efficient literature information access to biologists in general, but also provide tools that can be directly integrated into the biocuration workflow and the knowledge discovery process carried out by databases. Beyond addressing the current barriers faced by TM technologies applied to biological literature, BioCreative has further been conducting user requirement analyses, user-based evaluations and fostering standards development for TM tool reuse and integration. This DATABASE virtual issue captures the major results from the Fourth BioCreative Challenge Evaluation Workshop, and is the sixth special issue devoted to BioCreative. Built on the success of the previous Challenge Evaluations and Workshops (BioCreative I, II, II.5, III, 2012) (1–5), the BioCreative IV Workshop was held in Bethesda, MD, on October 7–9, 2013.

Highlights

As one example, in the BioCreative Workshop 2012, we reviewed descriptions of curation workflows from expert curated databases to identify commonalities and differences among these [15]
BioCreative is distinct from other challenges in the bioNLP domain in how it selects its specific tasks, or tracks
BioCreative has collaborated with curators from a variety of databases, including Gene Ontology Annotation [6], IntAct [7], MINT [8], BioGRID [9], Flybase [10], Mouse Genome Database [11], TAIR [12], Comparative Toxicogenomics Database (CTD) [13] and WormBase [14]

Summary

Introduction

In the BioCreative Workshop 2012, we reviewed descriptions of curation workflows from expert curated databases to identify commonalities and differences among these [15]. Challenge Evaluation tasks over the years have included ranking of relevant documents (‘document triage’), extraction of genes and proteins (‘gene mention’) and their linkage to database identifiers (‘gene normalization’), as well as extraction of functional annotation in standard ontologies [e.g. GO [16]] and extraction of entity relations [e.g. protein–protein interaction [17]].

Results

Conclusion