Abstract

Simultaneous clustering of rows and columns, usually designated by bi-clustering, co-clustering or block clustering, is an important technique in two way data analysis. A new standard and efficient approach have been recently proposed based on latent block model [Govaert and Nadif (2003)] which takes into account the block clustering problem on both the individual and variables sets. This article presents our R package for co-clustering of binary, contingency and continuous data blockcluster based on these very models. In this document, we will give a brief review of the model-based block clustering methods, and we will show how the R package blockcluster can be used for co-clustering.

Highlights

  • Cluster analysis is an important tool in a variety of scientific areas such as pattern recognition, information retrieval, micro-array, data mining, and so forth

  • Co-clustering have found numerous applications in the fields ranging from data mining, information retrieval, biology, computer vision and so forth. [Dhillon (2001)] published an article on text data mining by simultaneously clustering the documents and content using bipartite spectral graph partitioning

  • To study the block clustering problem, the previous formulation (2) is extended to propose block mixture model defined by the following probability density function f (x; θ) = p(u; θ)f (x|u; θ) where U denotes the set of all possible labellings of I × J and θ contains all the unknown parameters of this model

Read more

Summary

Introduction

Cluster analysis is an important tool in a variety of scientific areas such as pattern recognition, information retrieval, micro-array, data mining, and so forth. [Dhillon (2001)] published an article on text data mining by simultaneously clustering the documents and content (words) using bipartite spectral graph partitioning. This is quite useful technique for instance to manage huge corpus of unlabeled documents. The R package blockcluster allows to estimate the parameters of the co-clustering models [Govaert and Nadif (2003)] for binary, contingency and continuous data This package is unique from the point of view of generative models it implements (latent blocks), the used algorithms (BEM, BCEM) and, apart from that, special attention has been given to design the library for handling very huge data sets in reasonable time.

Mixture models
Latent block model
Model parameters estimation
Algorithms
Strategy for parameters estimation
Model initialization
Examples with simulated datasets
Image segmentation
Document clustering
Conclusion and future directions
Acknowledement
The core library
Library structure
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.