A General Framework for Fast Co-clustering on Large Datasets Using Matrix Decomposition.

Feng Pan,Wei Wang,Xiang Zhang

doi:10.1109/icde.2008.4497548

Abstract

Simultaneously clustering columns and rows (co-clustering) of large data matrix is an important problem with wide applications, such as document mining, microarray analysis, and recommendation systems. Several co-clustering algorithms have been shown effective in discovering hidden clustering structures in the data matrix. For a data matrix of m rows and n columns, the time complexity of these methods is usually in the order of m × n (if not higher). This limits their applicability to data matrices involving a large number of columns and rows. Moreover, an implicit assumption made by existing co-clustering methods is that the whole data matrix needs to be held in the main memory. In this paper, we propose a general framework, CRD, for co-clustering large datasets utilizing recently developed sampling-based matrix decomposition methods. The time complexity of our approach is linear in m and n. And it does not require the whole data matrix be in the main memory. Extensive experimental results on synthetic and several well-known real-life datasets show that CRD achieves competitive accuracy to existing co-clustering methods but with much less computational cost.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A General Framework for Fast Co-clustering on Large Datasets Using Matrix Decomposition.

Abstract

Talk to us

Similar Papers

More From: Proceedings. ACM-SIGMOD International Conference on Management of Data

Lead the way for us

Journal: Proceedings. ACM-SIGMOD International Conference on Management of Data	Publication Date: Apr 1, 2008
Citations: 22

Similar Papers

CRD
Feng Pan ... Xiang Zhang
-
Feng Pan, et. al.Feng Pan ... Xiang Zhang
09 Jun 2008
09 Jun 2008

A statistical package for the Hewlett-Packard 2000/Access
Michael D Biderman
Behavior Research Methods & Instrumentation | VOL. 10
Michael D BidermanMichael D Biderman
01 May 1978
A statistical package for the Hewlett-Packard 2000/Access
Michael D Biderman

Exemplar-based large-scale low-rank matrix decomposition for collaborative prediction
Hengxin Lei ... Yong Yu
International Journal of Computer Mathematics | VOL. 100
Hengxin Lei, et. al.Hengxin Lei ... Yong Yu
28 Oct 2022
International Journal of Computer Mathematics | VOL. 100

OvNMTF Algorithm: an Overlapping Non-Negative Matrix Tri-Factorization for Coclustering
Waldyr L De Freitas ... Sarajane M Peres
-
Waldyr L De Freitas, et. al.Waldyr L De Freitas ... Sarajane M Peres
01 Jul 2020
01 Jul 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A General Framework for Fast Co-clustering on Large Datasets Using Matrix Decomposition.

Abstract

Talk to us

Similar Papers

More From: Proceedings. ACM-SIGMOD International Conference on Management of Data