Abstract
Motivation: In recent years, Markov clustering (MCL) has emerged as an effective algorithm for clustering biological networks—for instance clustering protein–protein interaction (PPI) networks to identify functional modules. However, a limitation of MCL and its variants (e.g. regularized MCL) is that it only supports hard clustering often leading to an impedance mismatch given that there is often a significant overlap of proteins across functional modules.Results: In this article, we seek to redress this limitation. We propose a soft variation of Regularized MCL (R-MCL) based on the idea of iteratively (re-)executing R-MCL while ensuring that multiple executions do not always converge to the same clustering result thus allowing for highly overlapped clusters. The resulting algorithm, denoted soft regularized Markov clustering, is shown to outperform a range of extant state-of-the-art approaches in terms of accuracy of identifying functional modules on three real PPI networks.Availability: All data and codes are freely available upon request.Contact: srini@cse.ohio-state.eduSupplementary Information: Supplementary data are available at Bioinformatics online.
Highlights
Advances in technology have enabled scientists to determine, identify and validate pairwise protein interactions through a range of experimental approaches
Since R-Markov clustering (MCL) is very efficient in clustering a protein– protein interaction (PPI) network, which typically contains less than 10 000 nodes and 100 000 edges, and the difference between clusterings produced by each iteration should be so slight that every possible cluster is produced, we suggest that t is set to a large number from 10 to 50 and β is set to a relative small number (1.25 in default)
As most functional modules identification algorithms produce small clusters which can be identified as high-level Gene Ontology (GO) terms, we aim to evaluate the results only based on high-level GO terms
Summary
Advances in technology have enabled scientists to determine, identify and validate pairwise protein interactions through a range of experimental approaches. Several highthroughput approaches have produced a large scale of protein– protein interaction (PPI) datasets. These approaches include yeast two-hybrid, protein co-immunoprecipitation followed by mass spectrometry (MS), protein chip technologies and tandem affinity purification (TAP) with MS. Such data have led researchers to discover protein functions through PPI networks, in which a node represents a protein and an edge mimics an interaction between two proteins. A fundamental goal here is to discover functional modules or protein complexes in order to predict the function of unannotated proteins. Identifying functional modules is similar to detecting communities (clusters) in a network (graph). A number of ‘soft’ clustering algorithms have been recently proposed to identify functional modules in a PPI network, and they can be grouped into three categories
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.