Information-theoretic co-clustering

Inderjit S Dhillon,Dharmendra S Modha,Subramanyam Mallela

doi:10.1145/956750.956764

Abstract

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory---the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Information-theoretic co-clustering

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Partition-Symmetrical Entropy Functions
Qi Chen ... Raymond W Yeung
IEEE Transactions on Information Theory | VOL. 62
Qi Chen, et. al.Qi Chen ... Raymond W Yeung
01 Oct 2016
IEEE Transactions on Information Theory | VOL. 62

Co-clustering for queries and corresponding advertisement
Fan Yang ... Bin An
-
Fan Yang, et. al.Fan Yang ... Bin An
01 Jul 2009
01 Jul 2009

Discussion of the Papers by Edwards, and Wermuth and Lauritzen
-
Journal of the Royal Statistical Society Series B: Statistical Methodology | VOL. 52
--
01 Sep 1990
Journal of the Royal Statistical Society Series B: Statistical Methodology | VOL. 52

On the Calculation of Mutual Information
Tyrone E Duncan
SIAM Journal on Applied Mathematics | VOL. 19
Tyrone E DuncanTyrone E Duncan
01 Jul 1970
SIAM Journal on Applied Mathematics | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Information-theoretic co-clustering

Abstract

Talk to us

Similar Papers