Abstract

Most proteins perform their biological functions while interacting as complexes. The detection of protein complexes is an important task not only for understanding the relationship between functions and structures of biological network, but also for predicting the function of unknown proteins. We present a new nodal metric by integrating its local topological information. The metric reflects its representability in a larger local neighborhood to a cluster of a protein interaction (PPI) network. Based on the metric, we propose a seed-expansion graph clustering algorithm (SEGC) for protein complexes detection in PPI networks. A roulette wheel strategy is used in the selection of the seed to enhance the diversity of clustering. For a candidate node u, we define its closeness to a cluster C, denoted as NC(u, C), by combing the density of a cluster C and the connection between a node u and C. In SEGC, a cluster which initially consists of only a seed node, is extended by adding nodes recursively from its neighbors according to the closeness, until all neighbors fail the process of expansion. We compare the F-measure and accuracy of the proposed SEGC algorithm with other algorithms on Saccharomyces cerevisiae protein interaction networks. The experimental results show that SEGC outperforms other algorithms under full coverage.

Highlights

  • In the proteomics era, various high throughput experimental techniques and computational methods have produced enormous protein interactions data [1], which have contributed to predict protein function [2,3] and detect protein complexes from protein–protein interaction (PPI) networks [4].Prediction of protein complexes can help to understand principles of cellular organization and biological functions of proteins [5,6,7]

  • We address the above limits and propose a new seed-expansion graph clustering algorithm (SEGC) that produces overlapped clusters for protein complex detection

  • The results show that SEGC outperforms other algorithms under full coverage in terms of both F-measure and accuracy with a real benchmark protein complex data set

Read more

Summary

Introduction

Various high throughput experimental techniques and computational methods have produced enormous protein interactions data [1], which have contributed to predict protein function [2,3] and detect protein complexes from protein–protein interaction (PPI) networks [4]. The Molecular Complex Detection (MCODE) algorithm [21] is one of the most classical seed expansion computational methods that can identify densely connected clusters in PPI networks. It first weights all nodes by their k-core neighborhood density as local network density, and expands from highest weighted node by adding nodes whose vertex weight percentage (VWP, weight percentage away from the weight of the seed vertex) is above a given threshold. The results show that SEGC outperforms other algorithms under full coverage in terms of both F-measure and accuracy with a real benchmark protein complex data set

Preliminary
Algorithm Overview
Node Weighing
Seed Selection
Cluster Expansion
Complexity
PPI Datasets and Metrics
Parameter Setting
Effectiveness of Our Strategies
Comparison with Other Algorithms
Stability of SEGC
Examples of Predicted Complexes
Examples of of predicted standardcomplexes: complexes
Conclusions
Findings
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.