Extraction of chemical-protein interactions from the literature using neural networks and narrow instance representation.

Rui Antunes,Sérgio Matos

doi:10.1093/database/baz095

Abstract

The scientific literature contains large amounts of information on genes, proteins, chemicals and their interactions. Extraction and integration of this information in curated knowledge bases help researchers support their experimental results, leading to new hypotheses and discoveries. This is especially relevant for precision medicine, which aims to understand the individual variability across patient groups in order to select the most appropriate treatments. Methods for improved retrieval and automatic relation extraction from biomedical literature are therefore required for collecting structured information from the growing number of published works. In this paper, we follow a deep learning approach for extracting mentions of chemical–protein interactions from biomedical articles, based on various enhancements over our participation in the BioCreative VI CHEMPROT task. A significant aspect of our best method is the use of a simple deep learning model together with a very narrow representation of the relation instances, using only up to 10 words from the shortest dependency path and the respective dependency edges. Bidirectional long short-term memory recurrent networks or convolutional neural networks are used to build the deep learning models. We report the results of several experiments and show that our best model is competitive with more complex sentence representations or network structures, achieving an F1-score of 0.6306 on the test set. The source code of our work, along with detailed statistics, is publicly available.

Highlights

As the knowledge of how biological systems work at different structural levels grows, more possibilities arise for applying it in diagnosing and treating common and complex diseases
As noted in the previous section, the use of different random states generates different training and validation subsets, which in turn results in different trained models. This approach allows using a large amount of data for early stopping, which in our preliminary experiments proved important for improving generalization, while still using most of the available data for training
The three best results on the development set (F1-scores: 0.6496, 0.6473 and 0.6385) were obtained by the bidirectional long short-term memory (BiLSTM) model using only the shortest dependency path (SDP) with word and dependency features where different embedding models are used, being the highest result achieved with the biomedical word embeddings created by Chen et al [57]

Summary

Introduction

As the knowledge of how biological systems work at different structural levels grows, more possibilities arise for applying it in diagnosing and treating common and complex diseases. Relevant fine-grained information is constantly being communicated in the form of natural language through scientific publications To exploit this source of updated knowledge, several methods have been proposed for retrieving relevant articles for database curation [2], and for extracting from the unstructured texts information such as entity mentions [3, 4], biomolecular interactions and events [5, 6] or the clinical and pharmacological impact of genetic mutations [7]. These methods have proven essential for collecting the most recent research results and for expediting database curation [8]. The development of systems able to automatically extract such relations may expedite curation work and contribute to the amount of information available in structured annotation databases, in a form that is searched and retrieved by researchers

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Database : the journal of biological databases and curation	Publication Date: Jan 1, 2019
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Extraction of chemical-protein interactions from the literature using neural networks and narrow instance representation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database : the journal of biological databases and curation

Lead the way for us

Similar Papers

Prediction of Strawberry Yield and Farm Price Utilizing Deep Learning
Lobna Nassar ... Muhammad Saad
-
Lobna Nassar, et. al.Lobna Nassar ... Muhammad Saad
01 Jul 2020
01 Jul 2020

A Hybrid Deep Learning Model for Protein–Protein Interactions Extraction from Biomedical Literature
Changqin Quan ... Zhiwei Luo
Applied Sciences | VOL. 10
Changqin Quan, et. al.Changqin Quan ... Zhiwei Luo
13 Apr 2020
Applied Sciences | VOL. 10

Bidirectional Recurrent Convolutional Neural Network for Relation Classification
Rui Cai ... Houfeng Wang
-
Rui Cai, et. al.Rui Cai ... Houfeng Wang
01 Jan 2015
01 Jan 2015

A hybrid model based on neural networks for biomedical relation extraction
Yijia Zhang ... Liang Yang
Journal of Biomedical Informatics | VOL. 81
Yijia Zhang, et. al.Yijia Zhang ... Liang Yang
27 Mar 2018
Journal of Biomedical Informatics | VOL. 81

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extraction of chemical-protein interactions from the literature using neural networks and narrow instance representation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database : the journal of biological databases and curation