An end-to-end deep learning architecture for extracting protein-protein interactions affected by genetic mutations.

Tung Tran,Ramakanth Kavuluru

doi:10.1093/database/bay092

Abstract

The BioCreative VI Track IV (mining protein interactions and mutations for precision medicine) challenge was organized in 2017 with the goal of applying biomedical text mining methods to support advancements in precision medicine approaches. As part of the challenge, a new dataset was introduced for the purpose of building a supervised relation extraction model capable of taking a test article and returning a list of interacting protein pairs identified by their Entrez Gene IDs. Specifically, such pairs represent proteins participating in a binary protein–protein interaction relation where the interaction is additionally affected by a genetic mutation—referred to as a PPIm relation. In this study, we explore an end-to-end approach for PPIm relation extraction by deploying a three-component pipeline involving deep learning-based named-entity recognition and relation classification models along with a knowledge-based approach for gene normalization. We propose several recall-focused improvements to our original challenge entry that placed second when matching on Entrez Gene ID (exact matching) and on HomoloGene ID. On exact matching, the improved system achieved new competitive test results of 37.78% micro-F1 with a precision of 38.22% and recall of 37.34% that corresponds to an improvement from the prior best system by approximately three micro-F1 points. When matching on HomoloGene IDs, we report similarly competitive test results at 46.17% micro-F1 with a precision and recall of 46.67 and 45.59%, respectively, corresponding to an improvement of more than eight micro-F1 points over the prior best result. The code for our deep learning system is made publicly available at https://github.com/bionlproc/biocppi_extraction.

Highlights

Precision medicine is an emerging disease treatment paradigm in which healthcare is customized to each individual patient
In this paper we exclusively focus on the PPIm extraction task and propose a pipeline of the following three modular components: named entity recognition (NER), gene mention normalization (GN) and relation classification (RC)
We propose the use of a deep neural network system based on a Convolutional neural networks (CNNs)–long short-term memory’ (LSTM) hybrid model initially proposed by Chiu et al [6] for NER

Summary

Introduction

Precision medicine is an emerging disease treatment paradigm in which healthcare is customized to each individual patient. We refer to this particular type of relation, where the participants of a PPI are affected by a mutation, as a PPIm relation This challenge is important as there has been a lack of tools that allows for the extraction of such interactions from biomedical literature despite its potential to support approaches in precision medicine. Convolutional neural networks (CNNs) in particular were originally developed for image recognition tasks [18] and have been successfully applied to the text domain by exploiting so-called neural word embeddings [17, 33]. These word embeddings represent words as vectors and can be pre-trained using unsupervised methods and further trained when learning on a specific task. Using CNNs along with neural word embeddings has been shown to be effective in many natural language tasks (including text classification and relation extraction) since they naturally capture syntactic and semantic information [3, 7, 24]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Database	Publication Date: Jan 1, 2018
Citations: 39	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An end-to-end deep learning architecture for extracting protein-protein interactions affected by genetic mutations.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database

Lead the way for us

Similar Papers

Deep learning referral suggestion and tumour discrimination using explainable artificial intelligence applied to multiparametric MRI.
Hyungseob Shin ... Sang Ik Park
European Radiology | VOL. 33
Hyungseob Shin, et. al.Hyungseob Shin ... Sang Ik Park
08 May 2023
European Radiology | VOL. 33

Clinical application of convolutional neural network for mass analysis on mammograms.
Lin Li ... Jialin Yuan
Quantitative imaging in medicine and surgery | VOL. 13
Lin Li, et. al.Lin Li ... Jialin Yuan
01 Dec 2023
Quantitative imaging in medicine and surgery | VOL. 13

Three Reasons Why Artificial Intelligence Might Be the Radiologist's Best Friend.
Rick R Van Rijn ... Alberto De Luca
Radiology | VOL. 296
Rick R Van Rijn, et. al.Rick R Van Rijn ... Alberto De Luca
21 Apr 2020
Radiology | VOL. 296

Clones in deep learning code: what, where, and why?
Hadhemi Jebnoun ... Md Saidur Rahman
Empirical Software Engineering | VOL. 27
Hadhemi Jebnoun, et. al.Hadhemi Jebnoun ... Md Saidur Rahman
08 Apr 2022
Empirical Software Engineering | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An end-to-end deep learning architecture for extracting protein-protein interactions affected by genetic mutations.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database