Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts

Thorsten Barnickel,Volker Stümpflen,Hans-Werner Mewes,Jason Weston,Ronan Collobert

doi:10.1371/journal.pone.0006393

Abstract

To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA (“Semantic Extraction using a Neural Network Architecture”), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches.

Highlights

The rapidly increasing amount of biomedical publications is a key resource for the automated extraction and inference of relations between biomedical concepts such as protein-protein interactions or regulatory interrelations
We presented the use of a novel, Semantic Role Labeling (SRL) (SENNA) based approach for fast and reliable semantic role labeling of biomedical text corpora
For instance, could be the extraction of protein transport relations mentioned within GeneRIFs, a set of sentences in the Entrez Gene database describing the function of a gene, where 85% of the protein transport predicates were reported to be used as nouns [22]

Summary

Introduction

The rapidly increasing amount of biomedical publications is a key resource for the automated extraction and inference of relations between biomedical concepts such as protein-protein interactions or regulatory interrelations. SENNA [14,15], a semantic role labeling program trained on the PropBank corpus, does not rely on the extraction of syntax trees for assigning semantic roles to sentence constituents Instead, it uses a radically different approach compared to the existing SRL programs: skipping the step of syntax tree generation, SENNA’s neural network architecture was trained directly on some basic, quickly derivable sentence features. In order to assess the applicability of SRL for extracting relations between biomedical entities, we examined how often the simplifying assumption holds true that all entities in the ARG0/ ARG1 parts generated by a SRL program act as actor/ target in the sense of the verb This question is of crucial importance to assess whether the proposed SRL based approach can be used with sufficient reliability to build up a large scale biomedical text mining system. We choose SENNA for the evaluation of SRL based relation extraction (RE), applied SENNA to almost 90 million MEDLINE sentences and compared its speed with syntactic parsers commonly used for relation extraction in the biological domain

Methods

Results and Discussion

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Jul 28, 2009
Citations: 65	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features.
Richard Tzong-Han Tsai ... Wei Ku
BMC Bioinformatics | VOL. 8
Richard Tzong-Han Tsai, et. al.Richard Tzong-Han Tsai ... Wei Ku
01 Sep 2007
BMC Bioinformatics | VOL. 8

Research on Pattern Representation Method in Semi-supervised Semantic Relation Extraction Based on Bootstrapping
Feiyue Ye ... Shanpeng Wu
-
Feiyue Ye, et. al.Feiyue Ye ... Shanpeng Wu
01 Dec 2014
01 Dec 2014

Dependency-based semantic role labeling using sequence labeling with a structural SVM
Soojong Lim ... Dongyul Ra
Pattern Recognition Letters | VOL. 34
Soojong Lim, et. al.Soojong Lim ... Dongyul Ra
09 Feb 2013
Pattern Recognition Letters | VOL. 34

A Preliminary Study on the Robustness and Generalization of Role Sets for Semantic Role Labeling
Beñat Zapirain ... Lluís Màrquez
-
Beñat Zapirain, et. al.Beñat Zapirain ... Lluís Màrquez
17 Feb 2008
17 Feb 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE