Abstract

Motivation: Event extraction using expressive structured representations has been a significant focus of recent efforts in biomedical information extraction. However, event extraction resources and methods have so far focused almost exclusively on molecular-level entities and processes, limiting their applicability.Results: We extend the event extraction approach to biomedical information extraction to encompass all levels of biological organization from the molecular to the whole organism. We present the ontological foundations, target types and guidelines for entity and event annotation and introduce the new multi-level event extraction (MLEE) corpus, manually annotated using a structured representation for event extraction. We further adapt and evaluate named entity and event extraction methods for the new task, demonstrating that both can be achieved with performance broadly comparable with that for established molecular entity and event extraction tasks.Availability: The resources and methods introduced in this study are available from http://nactem.ac.uk/MLEE/.Contact: pyysalos@cs.man.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • A detailed understanding of biological systems requires the ability to trace cause and effect across multiple levels of biological organization, from molecular-level reactions to cellular, tissue- and organ-level effects to organism-level outcomes (Kitano, 2002)

  • To information extraction (IE), frequently termed event extraction, are capable of representing complex associations—such as the binding of a protein to another inhibiting its localization to a specific cellular compartment (Fig. 1)—and open many new opportunities for domain text mining applications ranging from semantic search to database and pathway curation support (Ananiadou et al, 2010)

  • There is significant momentum behind the move to richer representations for IE: more than 30 groups have introduced methods for biomedical event extraction in shared tasks (Kim et al, 2011a, b); event-annotated corpora have been introduced for many extraction targets, including DNA methylation (Ohta et al, 2011a), protein modifications (Pyysalo et al, 2011) and the molecular mechanisms of infectious diseases (Pyysalo et al, 2012c); event extraction methods have been applied to automatically analyze all 20 million PubMed abstracts (Björne et al, 2010); and event extraction analyses are being integrated into literature search systems such as MEDIE1 and applied in support of advanced tasks such as pathway curation (Ohta et al, 2011b)

Read more

Summary

Introduction

A detailed understanding of biological systems requires the ability to trace cause and effect across multiple levels of biological organization, from molecular-level reactions to cellular, tissue- and organ-level effects to organism-level outcomes (Kitano, 2002). Efforts in domain IE were primarily focused on the basic task of recognizing mentions of relevant entities such as genes and proteins in text (Yeh et al, 2005) and on the extraction of pairwise relations between these representing, for example, protein– protein interactions (Krallinger et al, 2007; Nédellec, 2005) Such representations lack the capacity to capture any but the simplest of associations. There has been increasing interest in the extraction of structured representations capable of capturing associations of arbitrary numbers of participants in specific roles Such approaches to IE, frequently termed event extraction, are capable of representing complex associations—such as the binding of a protein to another inhibiting its localization to a specific cellular compartment (Fig. 1)—and open many new opportunities for domain text mining applications ranging from semantic search to database and pathway curation support (Ananiadou et al, 2010). There is significant momentum behind the move to richer representations for IE: more than 30 groups have introduced methods for biomedical event extraction in shared tasks (Kim et al, 2011a, b); event-annotated corpora have been introduced for many extraction targets, including DNA methylation (Ohta et al, 2011a), protein modifications (Pyysalo et al, 2011) and the molecular mechanisms of infectious diseases (Pyysalo et al, 2012c); event extraction methods have been applied to automatically analyze all 20 million PubMed abstracts (Björne et al, 2010); and event extraction analyses are being integrated into literature search systems such as MEDIE1 and applied in support of advanced tasks such as pathway curation (Ohta et al, 2011b)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call