Abstract

BackgroundGraphs provide a natural framework for visualizing and analyzing networks of many types, including biological networks. Network clustering is a valuable approach for summarizing the structure in large networks, for predicting unobserved interactions, and for predicting functional annotations. Many current clustering algorithms suffer from a common set of limitations: poor resolution of top-level clusters; over-splitting of bottom-level clusters; requirements to pre-define the number of clusters prior to analysis; and an inability to jointly cluster over multiple interaction types.ResultsA new algorithm, Hierarchical Agglomerative Clustering (HAC), is developed for fast clustering of heterogeneous interaction networks. This algorithm uses maximum likelihood to drive the inference of a hierarchical stochastic block model for network structure. Bayesian model selection provides a principled method for collapsing the fine-structure within the smallest groups, and for identifying the top-level groups within a network. Model scores are additive over independent interaction types, providing a direct route for simultaneous analysis of multiple interaction types. In addition to inferring network structure, this algorithm generates link predictions that with cross-validation provide a quantitative assessment of performance for real-world examples.ConclusionsWhen applied to genome-scale data sets representing several organisms and interaction types, HAC provides the overall best performance in link prediction when compared with other clustering methods and with model-free graph diffusion kernels. Investigation of performance on genome-scale yeast protein interactions reveals roughly 100 top-level clusters, with a long-tailed distribution of cluster sizes. These are in turn partitioned into 1000 fine-level clusters containing 5 proteins on average, again with a long-tailed size distribution. Top-level clusters correspond to broad biological processes, whereas fine-level clusters correspond to discrete complexes. Surprisingly, link prediction based on joint clustering of physical and genetic interactions performs worse than predictions based on individual data sets, suggesting a lack of synergy in current high-throughput data.

Highlights

  • Graphs provide a natural framework for visualizing and analyzing networks of many types, including biological networks

  • Additional genetic interaction data sets were collected from genome-wide Synthetic Gene Array (SGA) [23] and diploid-based Synthetic Lethality Analysis on Microarray [24]

  • The hierarchical agglomerative clustering methods Hierarchical Agglomerative Clustering (HAC)-ML is effective at discovering structure in realworld networks, with the ability to resolve both toplevel and bottom-level groups

Read more

Summary

Introduction

Graphs provide a natural framework for visualizing and analyzing networks of many types, including biological networks. Graphs or networks provide an excellent organizing framework for representing data from high-throughput experiments that measure interactomes, or genomescale biological interactions: physical interactions between proteins; genetic interactions or specific phenotypes such as synthetic lethality between genes; gene regulation interactions between transcription factors and. An important current challenge is to develop methods to analyze these and other networks, such as social networks [3]. A second challenge is to predict links that might exist but which are not represented in the data. Missing links are prevalent in biological interactomes, where over half the true interactions may be absent from current data sets, and where spurious interactions may overwhelm true interactions in raw data [4]. Models based only on degree distribution have been unable to predict missing interactions [6]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.