Abstract

BackgroundA goal of systems biology is to analyze large-scale molecular networks including gene expressions and protein-protein interactions, revealing the relationships between network structures and their biological functions. Dividing a protein-protein interaction (PPI) network into naturally grouped parts is an essential way to investigate the relationship between topology of networks and their functions. However, clear modular decomposition is often hard due to the heterogeneous or scale-free properties of PPI networks.Methodology/Principal FindingsTo address this problem, we propose a diffusion model-based spectral clustering algorithm, which analytically solves the cluster structure of PPI networks as a problem of random walks in the diffusion process in them. To cope with the heterogeneity of the networks, the power factor is introduced to adjust the diffusion matrix by weighting the transition (adjacency) matrix according to a node degree matrix. This algorithm is named adjustable diffusion matrix-based spectral clustering (ADMSC). To demonstrate the feasibility of ADMSC, we apply it to decomposition of a yeast PPI network, identifying biologically significant clusters with approximately equal size. Compared with other established algorithms, ADMSC facilitates clear and fast decomposition of PPI networks.Conclusions/SignificanceADMSC is proposed by introducing the power factor that adjusts the diffusion matrix to the heterogeneity of the PPI networks. ADMSC effectively partitions PPI networks into biologically significant clusters with almost equal sizes, while being very fast, robust and appealing simple.

Highlights

  • A goal of systems biology is to analyze large-scale molecular networks including gene expressions and protein interactions, revealing the relationships between network structures and their biological functions

  • Since the complete linkage method provides higher modularity, higher speed, and less coefficient of variation (CV) of cluster size than the k-means method, the complete linkage method is selected for clustering the diffusion map of the protein-protein interaction (PPI) network

  • Use of the b factor takes an advantage in obtaining the highest modularity and in identifying the clusters with a small variation in size. It is hard for traditional clustering methods, which employ similarity measures between a pair of nodes to perform agglomerative approaches, to partition heterogeneous PPI networks into clusters with approximately equal sizes

Read more

Summary

Introduction

A goal of systems biology is to analyze large-scale molecular networks including gene expressions and protein interactions, revealing the relationships between network structures and their biological functions. A common way to network analysis is to partition the network into subnetworks responsible for specific biological functions. Since biological functions can be carried out by particular groups of genes and proteins, dividing networks into naturally grouped parts (clusters or communities) is an essential way to investigate some relationships between the function and topology of networks or to reveal hidden knowledge behind them. A goal of systems biology is to analyze large-scale molecular networks including gene expressions and protein-protein interactions, revealing the relationships between network structures and their biological functions. Dividing a protein-protein interaction (PPI) network into naturally grouped parts is an essential way to investigate the relationship between topology of networks and their functions. Clear modular decomposition is often hard due to the heterogeneous or scale-free properties of PPI networks

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call