Automatic extraction of protein-protein interactions using grammatical relationship graph

Kaixian Yu,Pei-Yau Lung,Yan-Yuan Tseng,Peixiang Zhao,Tingting Zhao,Jinfeng Zhang

doi:10.1186/s12911-018-0628-4

Abstract

BackgroundRelationships between bio-entities (genes, proteins, diseases, etc.) constitute a significant part of our knowledge. Most of this information is documented as unstructured text in different forms, such as books, articles and on-line pages. Automatic extraction of such information and storing it in structured form could help researchers more easily access such information and also make it possible to incorporate it in advanced integrative analysis. In this study, we developed a novel approach to extract bio-entity relationships information using Nature Language Processing (NLP) and a graph-theoretic algorithm.MethodsOur method, called GRGT (Grammatical Relationship Graph for Triplets), not only extracts the pairs of terms that have certain relationships, but also extracts the type of relationship (the word describing the relationships). In addition, the directionality of the relationship can also be extracted. Our method is based on the assumption that a triplet exists for a pair of interactions. A triplet is defined as two terms (entities) and an interaction word describing the relationship of the two terms in a sentence. We first use a sentence parsing tool to obtain the sentence structure represented as a dependency graph where words are nodes and edges are typed dependencies. The shortest paths among the pairs of words in the triplet are then extracted, which form the basis for our information extraction method. Flexible pattern matching scheme was then used to match a triplet graph with unknown relationship to those triplet graphs with labels (True or False) in the database.ResultsWe applied the method on three benchmark datasets to extract the protein-protein-interactions (PPIs), and obtained better precision than the top performing methods in literature.ConclusionsWe have developed a method to extract the protein-protein interactions from biomedical literature. PPIs extracted by our method have higher precision among other methods, suggesting that our method can be used to effectively extract PPIs and deposit them into databases. Beyond extracting PPIs, our method could be easily extended to extracting relationship information between other bio-entities.

Highlights

Relationships between bio-entities constitute a significant part of our knowledge
We propose a method based on Nature Language Processing (NLP) and automatically learn rules/patterns to extract the Protein-protein interaction (PPI) triplets from sentences
Our method, Grammatical Relationship Graph for Triplets (GRGT), utilized the grammatical relationship among each Protein-Protein-Interaction triplet extracted by natural language processing (NLP) techniques and a graph theorem algorithm as feature to build a classifier

Summary

Introduction

Relationships between bio-entities (genes, proteins, diseases, etc.) constitute a significant part of our knowledge Most of this information is documented as unstructured text in different forms, such as books, articles and on-line pages. Computational methods have been designed to extract bio-entity relationships automatically from the literature, and used to assist scientists in their efforts to build databases using manual annotation approach [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48]. Most of the PPI extraction methods are based on one of the two ways: (1) specify some rules (or patterns, templates etc.) manually [34, 50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66]; or (2) infer/learn the rules computationally from manually labeled sentences [67,68,69]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Informatics and Decision Making	Publication Date: Jul 1, 2018
Citations: 24	License type: open-access

R Discovery Prime

R Discovery Prime

Automatic extraction of protein-protein interactions using grammatical relationship graph

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

Extraction of protein-protein interactions using natural language processing based pattern matching
Kaixian Yu ... Tingting Zhao
-
Kaixian Yu, et. al.Kaixian Yu ... Tingting Zhao
01 Nov 2017
01 Nov 2017

Human-Machine Information Extraction Simulator for Biological Collections
Icaro Alzuru ... Mauricio Tsugawa
-
Icaro Alzuru, et. al.Icaro Alzuru ... Mauricio Tsugawa
01 Dec 2019
01 Dec 2019

Information Extraction from the Text Data on Traditional Chinese Medicine: A Review on Tasks, Challenges, and Methods from 2010 to 2021.
Tingting Zhang ... Chuanbiao Wen
Evidence-Based Complementary and Alternative Medicine | VOL. 2022
Tingting Zhang, et. al.Tingting Zhang ... Chuanbiao Wen
13 May 2022
Evidence-Based Complementary and Alternative Medicine | VOL. 2022

Multiple kernel learning in protein–protein interaction extraction from biomedical literature
Zhihao Yang ... Nan Tang
Artificial Intelligence in Medicine | VOL. 51
Zhihao Yang, et. al.Zhihao Yang ... Nan Tang
03 Jan 2011
Artificial Intelligence in Medicine | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic extraction of protein-protein interactions using grammatical relationship graph

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making