Blockcluster: An R Package for Model-Based Co-Clustering

Parmeet Singh Bhatia,Gérard Govaert,Serge Iovleff

doi:10.18637/jss.v076.i09

Abstract

Simultaneous clustering of rows and columns, usually designated by bi-clustering, co-clustering or block clustering, is an important technique in two way data analysis. A new standard and efficient approach have been recently proposed based on latent block model [Govaert and Nadif (2003)] which takes into account the block clustering problem on both the individual and variables sets. This article presents our R package for co-clustering of binary, contingency and continuous data blockcluster based on these very models. In this document, we will give a brief review of the model-based block clustering methods, and we will show how the R package blockcluster can be used for co-clustering.

Highlights

Cluster analysis is an important tool in a variety of scientific areas such as pattern recognition, information retrieval, micro-array, data mining, and so forth
Co-clustering have found numerous applications in the fields ranging from data mining, information retrieval, biology, computer vision and so forth. [Dhillon (2001)] published an article on text data mining by simultaneously clustering the documents and content using bipartite spectral graph partitioning
To study the block clustering problem, the previous formulation (2) is extended to propose block mixture model defined by the following probability density function f (x; θ) = p(u; θ)f (x|u; θ) where U denotes the set of all possible labellings of I × J and θ contains all the unknown parameters of this model

Summary

Introduction

Cluster analysis is an important tool in a variety of scientific areas such as pattern recognition, information retrieval, micro-array, data mining, and so forth. [Dhillon (2001)] published an article on text data mining by simultaneously clustering the documents and content (words) using bipartite spectral graph partitioning. This is quite useful technique for instance to manage huge corpus of unlabeled documents. The R package blockcluster allows to estimate the parameters of the co-clustering models [Govaert and Nadif (2003)] for binary, contingency and continuous data This package is unique from the point of view of generative models it implements (latent blocks), the used algorithms (BEM, BCEM) and, apart from that, special attention has been given to design the library for handling very huge data sets in reasonable time.

Mixture models

Latent block model

Model parameters estimation

Algorithms

Strategy for parameters estimation

Model initialization

Examples with simulated datasets

Image segmentation

Document clustering

Conclusion and future directions

Acknowledement

The core library

Library structure

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Statistical Software	Publication Date: Jan 1, 2017
Citations: 23	License type: cc-by

R Discovery Prime

R Discovery Prime

Blockcluster: An R Package for Model-Based Co-Clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Statistical Software

Lead the way for us

Similar Papers

Block clustering with collapsed latent block models
Jason Wyse ... Nial Friel
Statistics and Computing | VOL. 22
Jason Wyse, et. al.Jason Wyse ... Nial Friel
05 May 2011
Statistics and Computing | VOL. 22

An Approximation of the Integrated Classification Likelihood for the Latent Block Model
Aurore Lomet ... Gerard Govaert
-
Aurore Lomet, et. al.Aurore Lomet ... Gerard Govaert
01 Dec 2012
01 Dec 2012

Block clustering with Bernoulli mixture models: Comparison of different approaches
Gérard Govaert ... Mohamed Nadif
Computational Statistics & Data Analysis | VOL. 52
Gérard Govaert, et. al.Gérard Govaert ... Mohamed Nadif
20 Sep 2007
Computational Statistics & Data Analysis | VOL. 52

Model-Based Co-clustering for Continuous Data
Mohamed Nadif ... Gerard Govaert
-
Mohamed Nadif, et. al.Mohamed Nadif ... Gerard Govaert
01 Dec 2010
01 Dec 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Blockcluster: An R Package for Model-Based Co-Clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Statistical Software