Abstract

BackgroundProtein-protein interactions (PPIs) are central to many biological processes. Considering that the experimental methods for identifying PPIs are time-consuming and expensive, it is important to develop automated computational methods to better predict PPIs. Various machine learning methods have been proposed, including a deep learning technique which is sequence-based that has achieved promising results. However, it only focuses on sequence information while ignoring the structural information of PPI networks. Structural information of PPI networks such as their degree, position, and neighboring nodes in a graph has been proved to be informative in PPI prediction.ResultsFacing the challenge of representing graph information, we introduce an improved graph representation learning method. Our model can study PPI prediction based on both sequence information and graph structure. Moreover, our study takes advantage of a representation learning model and employs a graph-based deep learning method for PPI prediction, which shows superiority over existing sequence-based methods. Statistically, Our method achieves state-of-the-art accuracy of 99.15% on Human protein reference database (HPRD) dataset and also obtains best results on Database of Interacting Protein (DIP) Human, Drosophila, Escherichia coli (E. coli), and Caenorhabditis elegans (C. elegan) datasets.ConclusionHere, we introduce signed variational graph auto-encoder (S-VGAE), an improved graph representation learning method, to automatically learn to encode graph structure into low-dimensional embeddings. Experimental results demonstrate that our method outperforms other existing sequence-based methods on several datasets. We also prove the robustness of our model for very sparse networks and the generalization for a new dataset that consists of four datasets: HPRD, E.coli, C.elegan, and Drosophila.

Highlights

  • Protein-protein interactions (PPIs) are central to many biological processes

  • We firstly evaluated the performance of the proposed method for predicting five different datasets: Human protein reference database (HPRD) dataset, Database of Interacting Protein (DIP) Human, Drosophila, Escherichia coli (E. coli), and Caenorhabditis elegans (C. elegan) by using different evaluation measures

  • We can conduct representation learning with graph and simplified molecular input line entry specification (SMILES) string features of drug molecules using signed variational graph auto-encoder (S-variational graph auto-encoder (VGAE)) to predict drug interactions

Read more

Summary

Introduction

Protein-protein interactions (PPIs) are central to many biological processes. Considering that the experimental methods for identifying PPIs are time-consuming and expensive, it is important to develop automated computational methods to better predict PPIs. The rapid development of high-throughput technologies are used to detect protein interactions, such as yeast two-hybrid screens (Y2H) [3], tandem affinity purification (TAP) [4] and mass spectrometric protein complex identification (MS-PCI) [5], Tandem Affinity Purification and Mass Spectrometry (TAP-MS) [6], affinity chromatography and Co-Immunoprecipitation (Co-IP) [7]. These experimental methods have contributed to exponential growth of the number of PPIs of various species, but the functional annotation of both proteins and their interactions is updated at a slow speed. High-throughput computational methods that are useful for the study of protein functions are required for discovering PPI with high quality and accuracy [11]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call