Abstract

AbstractRecent high‐throughput experiments have generated protein–protein interaction data on a genomic scale, yielding the complete interactome for several organisms. Various graph clustering algorithms have been applied to protein interaction networks for identifying protein complexes and functional modules. Although the previous algorithms are scalable and robust, their accuracy is still limited because of the complex connectivity found in protein interaction networks. In this study, we propose a novel information‐theoretic definition, graph entropy, as a measure of the structural complexity of a graph. Loss of graph entropy represents an increase in modularity of the graph. Based on this concept, we present a graph clustering algorithm which searches for the local optimum in modularity. The algorithm detects each optimal cluster by growing a seed in a manner that minimizes graph entropy. In the experiments with the yeast interactome, the results show that the graph entropy approach has higher accuracy in predicting protein complexes and functional modules than the best competing method. We statistically compared output clusters to both known protein complexes and Gene Ontology annotations in the biological process and molecular function categories in order to measure f‐scores and p‐scores as clustering accuracy. Because this algorithm is also scalable, it can be applied to the larger scale human protein interaction network.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call