Markov clustering of protein interaction networks with improved balance and scalability

Venu Satuluri,Duygu Ucar,Srinivasan Parthasarathy

doi:10.1145/1854776.1854812

Abstract

Markov Clustering (MCL) is a popular algorithm for clustering networks in bioinformatics such as protein-protein interaction networks and protein similarity networks. An important requirement when clustering protein networks is minimizing the number of big clusters, since it is generally understood that protein complexes tend not to have more than 15--30 nodes. Similarly, it is important to not output too many singleton clusters, since they do not provide much useful information. In this paper, we show how MCL may be modified so as to better respect these two requirements, while also taking the link structure in the graph into account. We design our algorithm on top of Regularized MCL (R-MCL) [16], a previously proposed modification of MCL. Our proposed variation computes a new regularization matrix at each iteration that penalizes big cluster sizes, with the size of the penalty being tunable using a balance parameter. This algorithm also naturally fits in a Multi level framework that allows great improvements in speed. We perform experiments on three real protein interaction networks and show significant improvements over MCL in quality, balance and execution speed.

Full Text