Event extraction on PubMed scale

Filip Ginter,Sampo Pyysalo,Jari Björne

doi:10.1186/1471-2105-11-s5-o2

Abstract

There has been a growing interest in typed, recursively nested events as the target for information extraction in the biomedical domain. The BioNLP'09 Shared Task on Event Extraction [1] provided a standard definition of events and established the current state-of-the-art in event extraction through competitive evaluation on a standard dataset derived from the GENIA event corpus. We have previously established the scalability of event extraction to large corpora [2] and here we present a follow-up study in which event extraction is performed from the titles and abstracts of all 17.8M citations in the 2009 release of PubMed. The extraction pipeline is composed of state-of-the-art methods: the BANNER named entity recognizer [3], the McClosky-Charniak domain-adapted parser [4], and the Turku Event Extraction System [5], the winning entry of the Shared Task. The resulting dataset consists of over 19.2M instances of 4.5M unique events, of which 2.1M instances of 1.6M unique events recursively involve at least two different named entities. This dataset is several orders of magnitude larger than any previous event extraction effort and -- having been obtained by a demonstrably state-of-the-art pipeline — represents the most accurate event extraction output achievable with presently available tools. Compiling the dataset was a technically challenging undertaking and required roughly 8,300 CPU-hours. As the primary contribution of the study, we make the entire set of extracted events freely available at http://bionlp.utu.fi, together with the output of the individual stages of the pipeline, such as 36.5M named entity instances and syntactic analyzes for all 20M sentences containing at least one named entity. This resource will facilitate future research related to biological event networks by providing a standard, publicly available, large-scale dataset, avoiding the unnecessary duplication of efforts in executing the complex event extraction pipeline.

Highlights

There has been a growing interest in typed, recursively nested events as the target for information extraction in the biomedical domain
We have previously established the scalability of event extraction to large corpora [2] and here we present a follow-up study in which event extraction is performed from the titles and abstracts of all 17.8M citations in the 2009 release of PubMed
The extraction pipeline is composed of state-of-the-art methods: the BANNER named entity recognizer [3], the McClosky-Charniak domainadapted parser [4], and the Turku Event Extraction System [5], the winning entry of the Shared Task

Summary

Introduction

There has been a growing interest in typed, recursively nested events as the target for information extraction in the biomedical domain. We have previously established the scalability of event extraction to large corpora [2] and here we present a follow-up study in which event extraction is performed from the titles and abstracts of all 17.8M citations in the 2009 release of PubMed. The extraction pipeline is composed of state-of-the-art methods: the BANNER named entity recognizer [3], the McClosky-Charniak domainadapted parser [4], and the Turku Event Extraction System [5], the winning entry of the Shared Task.

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Oct 1, 2010
Citations: 9	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Event extraction on PubMed scale

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

University of Turku in the BioNLP'11 Shared Task
Jari Björne ... Tapio Salakoski
BMC Bioinformatics | VOL. 13
Jari Björne, et. al.Jari Björne ... Tapio Salakoski
01 Jun 2012
BMC Bioinformatics | VOL. 13

Interactive learning for joint event and relation extraction
Jingli Zhang ... Wenxuan Zhou
International Journal of Machine Learning and Cybernetics | VOL. 11
Jingli Zhang, et. al.Jingli Zhang ... Wenxuan Zhou
22 Jul 2019
International Journal of Machine Learning and Cybernetics | VOL. 11

Domain transformation on biological event extraction by learning methods.
Wen Juan Hou ... Bamfa Ceesay
Journal of Biomedical Informatics | VOL. 95
Wen Juan Hou, et. al.Wen Juan Hou ... Bamfa Ceesay
18 Jun 2019
Journal of Biomedical Informatics | VOL. 95

Biomedical Event Extraction Using Convolutional Neural Networks and Dependency Parsing
Jari Björne ... Tapio Salakoski
-
Jari Björne, et. al.Jari Björne ... Tapio Salakoski
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Event extraction on PubMed scale

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics