Sieve-based relation extraction of gene regulatory networks from biological literature.

Slavko Žitnik,Marko Bajec,Blaž Zupan,Marinka Žitnik

doi:10.1186/1471-2105-16-s16-s1

Slavko Žitnik, Marko Bajec + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-16-s16-s1

Copy DOI

Abstract

BackgroundRelation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks.ResultsWe develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions.ConclusionsLinear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains.

Highlights

Relation extraction is an essential procedure in literature mining
Sieves with conditional random fields can be trained on arbitrary text data and are applicable to broad range of relation extraction tasks and data domains
This paper aims at the extraction of gene regulatory networks of Bacillus subtilis

Summary

Introduction

Relation extraction is an essential procedure in literature mining. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. We are witnessing an unprecedented increase in the number of biomedical abstracts, experimental results and phenotype and gene descriptions being deposited to publicly available databases, such as NCBI’s PubMed. We are witnessing an unprecedented increase in the number of biomedical abstracts, experimental results and phenotype and gene descriptions being deposited to publicly available databases, such as NCBI’s PubMed This content represents potential new discoveries that could be inferred with appropriately designed natural language processing approaches. Biomedical mining of literature is a compelling way to identify possible candidate genes through integration of existing data

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Oct 30, 2015
Citations: 52	License type: cc-by

R Discovery Prime

R Discovery Prime

Sieve-based relation extraction of gene regulatory networks from biological literature.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Beyond linear chain
Diego Marcheggiani
ACM SIGIR Forum | VOL. 48
Diego MarcheggianiDiego Marcheggiani
26 Jun 2014
ACM SIGIR Forum | VOL. 48

Connecting Distant Entities with Induction through Conditional Random Fields for Named Entity Recognition: Precursor-Induced CRF
Wangjin Lee ... Jinwook Choi
-
Wangjin Lee, et. al.Wangjin Lee ... Jinwook Choi
01 Jan 2018
01 Jan 2018

Equivalence between LC-CRF and HMM, and Discriminative Computing of HMM-Based MPM and MAP
Elie Azeraf ... Emmanuel Monfrini
Algorithms | VOL. 16
Elie Azeraf, et. al.Elie Azeraf ... Emmanuel Monfrini
21 Mar 2023
Algorithms | VOL. 16

A conditional random field framework for language process in product review mining
Yue Ming ... Yu Wang
Multimedia Tools and Applications | VOL. 82
Yue Ming, et. al.Yue Ming ... Yu Wang
10 Jun 2022
Multimedia Tools and Applications | VOL. 82

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sieve-based relation extraction of gene regulatory networks from biological literature.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics