Abstract

Text mining (TM) in the field of biology is fast becoming a routine analysis for the extraction and curation of biological entities (e.g., genes, proteins, simple chemicals) as well as their relationships. Due to the wide applicability of TM in situations involving complex relationships, it is valuable to apply TM to the extraction of metabolic interactions (i.e., enzyme and metabolite interactions) through metabolic events. Here we present an integrated TM framework containing two modules for the extraction of metabolic events (Metabolic Event Extraction module—MEE) and for the construction of a metabolic interaction network (Metabolic Interaction Network Reconstruction module—MINR). The proposed integrated TM framework performed well based on standard measures of recall, precision and F-score. Evaluation of the MEE module using the constructed Metabolic Entities (ME) corpus yielded F-scores of 59.15% and 48.59% for the detection of metabolic events for production and consumption, respectively. As for the testing of the entity tagger for Gene and Protein (GP) and metabolite with the test corpus, the obtained F-score was greater than 80% for the Superpathway of leucine, valine, and isoleucine biosynthesis. Mapping of enzyme and metabolite interactions through network reconstruction showed a fair performance for the MINR module on the test corpus with F-score >70%. Finally, an application of our integrated TM framework on a big-scale data (i.e., EcoCyc extraction data) for reconstructing a metabolic interaction network showed reasonable precisions at 69.93%, 70.63% and 46.71% for enzyme, metabolite and enzyme–metabolite interaction, respectively. This study presents the first open-source integrated TM framework for reconstructing a metabolic interaction network. This framework can be a powerful tool that helps biologists to extract metabolic events for further reconstruction of a metabolic interaction network. The ME corpus, test corpus, source code, and virtual machine image with pre-configured software are available at www.sbi.kmutt.ac.th/ preecha/metrecon.

Highlights

  • Biological literature is vast and quickly growing

  • The Metabolic Entities (ME) corpus and an integrated Text mining (TM) framework for the reconstruction of a metabolic interaction network were developed in this study

  • To elaborate how a biologist can apply our integrated TM framework for reconstruction of the Superpathway of leucine, valine, and isoleucine biosynthesis, we show two example sentences extracted from PMID-1646790 that were obtained from Metabolic Event Extraction (MEE) and Metabolic Interaction Network Reconstruction (MINR) modules

Read more

Summary

Introduction

Biological literature is vast and quickly growing. Text mining (TM) has become a routine analysis tool for rapidly scanning the entire literature with an essential goal to extract the relationships between named biological entities and concepts. In order to face the challenges due to biological complexity, TM tasks have recently advanced from performing simple interaction extraction towards obtaining a better understanding of the semantics behind biological interactions by analyzing associated events. This task is known as event extraction. BioNLP-ST’13 (Kim, Wang & Yasunori, 2013) focused on complex relationships, especially related to the topic of biomolecular reactions, pathways and regulatory networks (Van Landeghem & Ginter, 2011; McClosky et al, 2012; Gerner et al, 2012; Bossy, Bessières & Nédellec, 2013; Ohta et al, 2013). The Pathway Curation (PC) Task— BioNLP-ST’13 presented by Ohta et al (2013) introduced an event extraction task setting to account for metabolic pathways

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call