Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates

Xiang Cheng,Lixin Gao,Jiangtao Yin,Sen Su

doi:10.1109/tkde.2015.2451634

Abstract

Co-clustering has emerged to be a powerful data mining tool for two-dimensional co-occurrence and dyadic data. However, co-clustering algorithms often require significant computational resources and have been dismissed as impractical for large data sets. Existing studies have provided strong empirical evidence that expectation-maximization (EM) algorithms (e.g., k-means algorithm) with sequential updates can significantly reduce the computational cost without degrading the resulting solution. Motivated by this observation, we introduce sequential updates for alternate minimization co-clustering (AMCC) algorithms which are variants of EM algorithms, and also show that AMCC algorithms with sequential updates converge. We then propose two approaches to parallelize AMCC algorithms with sequential updates in a distributed environment. Both approaches are proved to maintain the convergence properties of AMCC algorithms. Based on these two approaches, we present a new distributed framework, Co-ClusterD, which supports efficient implementations of AMCC algorithms with sequential updates. We design and implement Co-ClusterD, and show its efficiency through two AMCC algorithms: fast nonnegative matrix tri-factorization (FNMTF) and information theoretic co-clustering (ITCC). We evaluate our framework on both a local cluster of machines and the Amazon EC2 cloud. Empirical results show that AMCC algorithms implemented in Co-ClusterD can achieve a much faster convergence and often obtain better results than their traditional concurrent counterparts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Dec 1, 2015
Citations: 32

Similar Papers

Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates
Sen Su ... Lixin Gao
-
Sen Su, et. al.Sen Su ... Lixin Gao
01 Dec 2013
01 Dec 2013

HICC: an entropy splitting-based framework for hierarchical co-clustering
Wei Cheng ... Wei Wang
Knowledge and Information Systems | VOL. 46
Wei Cheng, et. al.Wei Cheng ... Wei Wang
10 Feb 2015
Knowledge and Information Systems | VOL. 46

OvNMTF Algorithm: an Overlapping Non-Negative Matrix Tri-Factorization for Coclustering
Waldyr L De Freitas ... Lucas Fernandes Brunialti
-
Waldyr L De Freitas, et. al.Waldyr L De Freitas ... Lucas Fernandes Brunialti
01 Jul 2020
01 Jul 2020

Accelerating Expectation-Maximization Algorithms with Frequent Updates
Jiangtao Yin ... Yanfeng Zhang
-
Jiangtao Yin, et. al.Jiangtao Yin ... Yanfeng Zhang
01 Sep 2012
01 Sep 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering