Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes.

Chern Han Yong,Limsoon Wong

doi:10.1186/s13062-015-0067-4

Chern Han Yong, Limsoon Wong

Open Access

https://doi.org/10.1186/s13062-015-0067-4

Copy DOI

Journal: Biology Direct	Publication Date: Aug 1, 2015
Citations: 38	License type: CC BY 4.0

Affiliation: National University of Singapore

Abstract

BackgroundThe prediction of protein complexes from high-throughput protein-protein interaction (PPI) data remains an important challenge in bioinformatics. Three groups of complexes have been identified as problematic to discover. First, many complexes are sparsely connected in the PPI network, and do not form dense clusters that can be derived by clustering algorithms. Second, many complexes are embedded within highly-connected regions of the PPI network, which makes it difficult to accurately delimit their boundaries. Third, many complexes are small (composed of two or three distinct proteins), so that traditional topological markers such as density are ineffective.ResultsWe have previously proposed three approaches to address these challenges. First, Supervised Weighting of Composite Networks (SWC) integrates diverse data sources with supervised weighting, and successfully fills in missing co-complex edges in sparse complexes to allow them to be predicted. Second, network decomposition (DECOMP) splits the PPI network into spatially- and temporally-coherent subnetworks, allowing complexes embedded within highly-connected regions to be more clearly demarcated. Finally, Size-Specific Supervised Weighting (SSS) integrates diverse data sources with supervised learning to weight edges in a size-specific manner—of being in a small complex versus a large complex—and improves the prediction of small complexes. Here we integrate these three approaches into a single system. We test the integrated approach on the prediction of yeast and human complexes, and show that it outperforms SWC, DECOMP, or SSS when run individually, achieving the highest precision and recall levels.ConclusionThree groups of protein complexes remain challenging to predict from PPI data: sparse complexes, embedded complexes, and small complexes. Our previous approaches have addressed each of these challenges individually, through data integration, PPI-network decomposition, and supervised learning. Here we integrate these approaches into a single complex-discovery system, which improves the prediction of all three types of challenging complexes. With our approach, protein complexes can be more accurately and comprehensively predicted, allowing a clearer elucidation of the modular machinery of the cell.ReviewersThis article was reviewed by Prof. Masanori Arita and Dr. Yang Liu (nominated by Prof. Charles DeLisi).

Highlights

The prediction of protein complexes from high-throughput protein-protein interaction (PPI) data remains an important challenge in bioinformatics
Many complexes are sparsely connected in the PPI network, and cannot be picked out by clustering algorithms which search for dense subgraphs
Many complexes are embedded within highly-connected regions of the PPI network with many extraneous edges connecting them to external proteins, so that clustering algorithms cannot properly delimit their boundaries

Summary

Introduction

The prediction of protein complexes from high-throughput protein-protein interaction (PPI) data remains an important challenge in bioinformatics. Many complexes are sparsely connected in the PPI network, and do not form dense clusters that can be derived by clustering algorithms. Many complexes are embedded within highly-connected regions of the PPI network, which makes it difficult to accurately delimit their boundaries. Many approaches have been proposed to derive complexes from high-throughput protein-protein interaction (PPI) data, typically by searching for dense clusters in the PPI network that correspond to groups of interacting proteins. Many complexes are sparsely connected in the PPI network, and cannot be picked out by clustering algorithms which search for dense subgraphs. Many complexes are embedded within highly-connected regions of the PPI network with many extraneous edges connecting them to external proteins, so that clustering algorithms cannot properly delimit their boundaries. Many complexes are small (that is, composed of two or three proteins), making measures of important topological features, such as density, ineffectual

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biology Direct

Lead the way for us

Similar Papers

Integrating experimental and literature protein-protein interaction data for protein complex prediction.
Yijia Zhang ... Hongfei Lin
BMC genomics | VOL. Suppl 16 2
Yijia Zhang, et. al.Yijia Zhang ... Hongfei Lin
21 Jan 2015
BMC genomics | VOL. Suppl 16 2

A method for predicting protein complex in dynamic PPI networks.
Yijia Zhang ... Yiwei Liu
BMC Bioinformatics | VOL. Suppl 17 7
Yijia Zhang, et. al.Yijia Zhang ... Yiwei Liu
01 Jul 2016
BMC Bioinformatics | VOL. Suppl 17 7

PRINCESS, a Protein Interaction Confidence Evaluation System with Multiple Data Sources
Dong Li ... Fuchu He
Molecular & Cellular Proteomics | VOL. 7
Dong Li, et. al.Dong Li ... Fuchu He
01 Jun 2008
Molecular & Cellular Proteomics | VOL. 7

A random walk based approach for improving protein-protein interaction network and protein complex prediction
Chengwei Lei ... Jianhua Ruan
-
Chengwei Lei, et. al.Chengwei Lei ... Jianhua Ruan
01 Oct 2012
01 Oct 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biology Direct