Mutual information, phi-squared and model-based co-clustering for contingency tables

Gérard Govaert,Mohamed Nadif

doi:10.1007/s11634-016-0274-6

Abstract

Many of the datasets encountered in statistics are two-dimensional in nature and can be represented by a matrix. Classical clustering procedures seek to construct separately an optimal partition of rows or, sometimes, of columns. In contrast, co-clustering methods cluster the rows and the columns simultaneously and organize the data into homogeneous blocks (after suitable permutations). Methods of this kind have practical importance in a wide variety of applications such as document clustering, where data are typically organized in two-way contingency tables. Our goal is to offer coherent frameworks for understanding some existing criteria and algorithms for co-clustering contingency tables, and to propose new ones. We look at two different frameworks for the problem of co-clustering. The first involves minimizing an objective function based on measures of association and in particular on phi-squared and mutual information. The second uses a model-based co-clustering approach, and we consider two models: the block model and the latent block model. We establish connections between different approaches, criteria and algorithms, and we highlight a number of implicit assumptions in some commonly used algorithms. Our contribution is illustrated by numerical experiments on simulated and real-case datasets that show the relevance of the presented methods in the document clustering field.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mutual information, phi-squared and model-based co-clustering for contingency tables

Abstract

Talk to us

Similar Papers

More From: Advances in Data Analysis and Classification

Lead the way for us

Journal: Advances in Data Analysis and Classification	Publication Date: Nov 16, 2016
Citations: 32

Similar Papers

Latent Block Model for Contingency Table
Gérard Govaert ... Mohamed Nadif
Communications in Statistics - Theory and Methods | VOL. 39
Gérard Govaert, et. al.Gérard Govaert ... Mohamed Nadif
13 Jan 2010
Communications in Statistics - Theory and Methods | VOL. 39

Model-Based Co-clustering for Continuous Data
Mohamed Nadif ... Gerard Govaert
-
Mohamed Nadif, et. al.Mohamed Nadif ... Gerard Govaert
01 Dec 2010
01 Dec 2010

Goodness-of-fit test for latent block models
Chihiro Watanabe ... Taiji Suzuki
Computational Statistics & Data Analysis | VOL. 154
Chihiro Watanabe, et. al.Chihiro Watanabe ... Taiji Suzuki
19 Sep 2020
Computational Statistics & Data Analysis | VOL. 154

Model-based co-clustering for mixed type data
Margot Selosse ... Christophe Biernacki
Computational Statistics & Data Analysis | VOL. 144
Margot Selosse, et. al.Margot Selosse ... Christophe Biernacki
18 Oct 2019
Computational Statistics & Data Analysis | VOL. 144

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mutual information, phi-squared and model-based co-clustering for contingency tables

Abstract

Talk to us

Similar Papers

More From: Advances in Data Analysis and Classification