Automated recognition of functional compound-protein relationships in literature.

Kersten Döring,Mingjie Gao,Ammar Qaseem,Aurélien F A Moumbock,Pankaj Mishra,Jianyu Li,Michael Becer,Kiran K Telukunta,Philippe Thomas,Florian Sauter,Pascal Kirchner,Stefan Günther

doi:10.1371/journal.pone.0220925

Abstract

MotivationMuch effort has been invested in the identification of protein-protein interactions using text mining and machine learning methods. The extraction of functional relationships between chemical compounds and proteins from literature has received much less attention, and no ready-to-use open-source software is so far available for this task.MethodWe created a new benchmark dataset of 2,613 sentences from abstracts containing annotations of proteins, small molecules, and their relationships. Two kernel methods were applied to classify these relationships as functional or non-functional, named shallow linguistic and all-paths graph kernel. Furthermore, the benefit of interaction verbs in sentences was evaluated.ResultsThe cross-validation of the all-paths graph kernel (AUC value: 84.6%, F1 score: 79.0%) shows slightly better results than the shallow linguistic kernel (AUC value: 82.5%, F1 score: 77.2%) on our benchmark dataset. Both models achieve state-of-the-art performance in the research area of relation extraction. Furthermore, the combination of shallow linguistic and all-paths graph kernel could further increase the overall performance slightly. We used each of the two kernels to identify functional relationships in all PubMed abstracts (29 million) and provide the results, including recorded processing time.AvailabilityThe software for the tested kernels, the benchmark, the processed 29 million PubMed abstracts, all evaluation scripts, as well as the scripts for processing the complete PubMed database are freely available at https://github.com/KerstenDoering/CPI-Pipeline.

Highlights

Interactions of biomolecules are substantial for most cellular processes, involving metabolism, signaling, regulation, and proliferation [1]
The SL and APG kernels were already applied in different domains, e.g. protein-protein interactions, drug-drug interactions, and neuroanatomical statements
The approach presented is focusing on the extraction of functional compound-protein relationships from literature

Summary

Method

We created a new benchmark dataset of 2,613 sentences from abstracts containing annotations of proteins, small molecules, and their relationships. Two kernel methods were applied to classify these relationships as functional or non-functional, named shallow linguistic and allpaths graph kernel. The benefit of interaction verbs in sentences was evaluated

Results

Introduction

Results and discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Mar 3, 2020
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Automated recognition of functional compound-protein relationships in literature.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

From words to pixels: text and image mining methods for service research
Francisco Villarroel Ordenes ... Shunyuan Zhang
Journal of Service Management | VOL. 30
Francisco Villarroel Ordenes, et. al.Francisco Villarroel Ordenes ... Shunyuan Zhang
09 Oct 2019
Journal of Service Management | VOL. 30

Machine Learning and Network Methods for Biology and Medicine
Lei Chen ... Chuan Lu
Computational and Mathematical Methods in Medicine | VOL. 2015
Lei Chen, et. al.Lei Chen ... Chuan Lu
01 Jan 2015
Computational and Mathematical Methods in Medicine | VOL. 2015

Protein-Protein Interaction Identification Using a Similarity-Constrained Graph Model.
Yun Niu ... Yuwei Wang
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 16
Yun Niu, et. al.Yun Niu ... Yuwei Wang
24 Nov 2017
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 16

Overall Survival Prognostic Modelling of Non-small Cell Lung Cancer Patients Using Positron Emission Tomography/Computed Tomography Harmonised Radiomics Features: The Quest for the Optimal Machine Learning Algorithm
Mehdi Amini ... Habib Zaidi
Clinical Oncology | VOL. 34
Mehdi Amini, et. al.Mehdi Amini ... Habib Zaidi
03 Dec 2021
Clinical Oncology | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated recognition of functional compound-protein relationships in literature.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one