Abstract

BackgroundBiomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires manually annotated corpora. However, as manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Although combined use of several such corpora could potentially allow an extraction system to achieve broad semantic coverage, there has been little research into learning from multiple corpora with partially overlapping semantic annotation scopes.ResultsWe propose a method for learning from multiple corpora with partial semantic annotation overlap, and implement this method to improve our existing event extraction system, EventMine. An evaluation using seven event annotated corpora, including 65 event types in total, shows that learning from overlapping corpora can produce a single, corpus-independent, wide coverage extraction system that outperforms systems trained on single corpora and exceeds previously reported results on two established event extraction tasks from the BioNLP Shared Task 2011.ConclusionsThe proposed method allows the training of a wide-coverage, state-of-the-art event extraction system from multiple corpora with partial semantic annotation overlap. The resulting single model makes broad-coverage extraction straightforward in practice by removing the need to either select a subset of compatible corpora or semantic types, or to merge results from several models trained on different individual corpora. Multi-corpus learning also allows annotation efforts to focus on covering additional semantic types, rather than aiming for exhaustive coverage in any single annotation effort, or extending the coverage of semantic types annotated in existing corpora.

Highlights

  • Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature

  • We show that using our approach, EventMine can outperform all previously proposed methods on two benchmark tasks established by the BioNLP Shared Task (ST) 2011, the Epigenetics and Post-translational Modifications (EPI) and Infectious Diseases (ID) tasks [4]

  • This paper has presented an approach to the construction of a wide coverage information extraction system through training on multiple corpora with partially overlapping annotation scopes

Read more

Summary

Introduction

Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. As manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Manual annotation is time-consuming and expensive, and annotation efforts become increasingly demanding as more types of entities, relations and events are included in the scope of annotation. Each annotation effort tends to focus on a limited number of semantic types relevant to its immediate aims, which in turn results in the proliferation of corpora that overlap only partially in semantic scope, if at all [3,4,5,6,7]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call