Abstract

BackgroundProtein complexes are one of the keys to deciphering the behavior of a cell system. During the past decade, most computational approaches used to identify protein complexes have been based on discovering densely connected subgraphs in protein-protein interaction (PPI) networks. However, many true complexes are not dense subgraphs and these approaches show limited performances for detecting protein complexes from PPI networks.ResultsTo solve these problems, in this paper we propose a supervised learning method based on network node embeddings which utilizes the informative properties of known complexes to guide the search process for new protein complexes. First, node embeddings are obtained from human protein interaction network. Then the protein interactions are weighted through the similarities between node embeddings. After that, the supervised learning method is used to detect protein complexes. Then the random forest model is used to filter the candidate complexes in order to obtain the final predicted complexes. Experimental results on real human and yeast protein interaction networks show that our method effectively improves the performance for protein complex detection.ConclusionsWe provided a new method for identifying protein complexes from human and yeast protein interaction networks, which has great potential to benefit the field of protein complex detection.

Highlights

  • Protein complexes are one of the keys to deciphering the behavior of a cell system

  • The golden standard of human protein complexes were downloaded from human protein reference database (HPRD), while the golden standard of yeast protein complexes were constructed by combining MIPS [31], Aloy [32], stochastic gradient descent (SGD) [33] with TAP06 [34]

  • The training sets contain three categories samples, for human: 1521 true complexes from the HPRD database are used as the positive samples, 1175 complexes predicted by the COACH method as the intermediate samples, and 2135 subgraphs obtained by randomly selecting nodes as the negative samples respectively

Read more

Summary

Introduction

Protein complexes are one of the keys to deciphering the behavior of a cell system. During the past decade, most computational approaches used to identify protein complexes have been based on discovering densely connected subgraphs in protein-protein interaction (PPI) networks. With the development of human genomics and the development of high-throughput techniques, massive protein-protein interaction (PPI) data have been generated. These PPI data have enable to automatically detect protein complexes from PPI networks. Dongen et al [6] proposed a protein complex discovery algorithm named MCL, which manipulates the adjacency matrix of yeast PPI networks with two operators called expansion and inflation. By iterating these two operators, it will find the clusters that have higher possibility to becoming protein complexes. Liu et al [9] came up with an algorithm named CMC for protein complex

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call