An effective approach to detecting both small and large complexes from protein-protein interaction networks

Bin Xu,Shuigeng Zhou,Yang Wang,Jihong Guan,Zewei Wang,Jiaogen Zhou

doi:10.1186/s12859-017-1820-8

Abstract

BackgroundPredicting protein complexes from protein-protein interaction (PPI) networks has been studied for decade. Various methods have been proposed to address some challenging issues of this problem, including overlapping clusters, high false positive/negative rates of PPI data and diverse complex structures. It is well known that most current methods can detect effectively only complexes of size ≥3, which account for only about half of the total existing complexes. Recently, a method was proposed specifically for finding small complexes (size = 2 and 3) from PPI networks. However, up to now there is no effective approach that can predict both small (size ≤ 3) and large (size >3) complexes from PPI networks.ResultsIn this paper, we propose a novel method, called CPredictor2.0, that can detect both small and large complexes under a unified framework. Concretely, we first group proteins of similar functions. Then, the Markov clustering algorithm is employed to discover clusters in each group. Finally, we merge all discovered clusters that overlap with each other to a certain degree, and the merged clusters as well as the remaining clusters constitute the set of detected complexes. Extensive experiments have shown that the new method can more effectively predict both small and large complexes, in comparison with the state-of-the-art methods.ConclusionsThe proposed method, CPredictor2.0, can be applied to accurately predict both small and large protein complexes.

Highlights

Predicting protein complexes from protein-protein interaction (PPI) networks has been studied for decade
A PPI data set can be represented as a protein-protein interaction network (PIN) where nodes are proteins and edges signifies the interactions between pairs
We first cluster proteins based on functional similarity calculated using Biology Process (BP) terms from Gene Ontology(GO) [29], for each group we find the subsets of proteins that are connected in the PIN

Summary

Results

We propose a novel method, called CPredictor2.0, that can detect both small and large complexes under a unified framework. We first group proteins of similar functions. The Markov clustering algorithm is employed to discover clusters in each group. We merge all discovered clusters that overlap with each other to a certain degree, and the merged clusters as well as the remaining clusters constitute the set of detected complexes. Extensive experiments have shown that the new method can more effectively predict both small and large complexes, in comparison with the state-of-the-art methods

Background

Methods

22: Add cand into Cands

Results and discussion

Conclusion