Abstract
Simultaneous clustering of rows and columns, usually designated by bi-clustering, co-clustering or block clustering, is an important technique in two way data analysis. A new standard and efficient approach have been recently proposed based on latent block model [Govaert and Nadif (2003)] which takes into account the block clustering problem on both the individual and variables sets. This article presents our R package for co-clustering of binary, contingency and continuous data blockcluster based on these very models. In this document, we will give a brief review of the model-based block clustering methods, and we will show how the R package blockcluster can be used for co-clustering.
Highlights
Cluster analysis is an important tool in a variety of scientific areas such as pattern recognition, information retrieval, micro-array, data mining, and so forth
Co-clustering have found numerous applications in the fields ranging from data mining, information retrieval, biology, computer vision and so forth. [Dhillon (2001)] published an article on text data mining by simultaneously clustering the documents and content using bipartite spectral graph partitioning
To study the block clustering problem, the previous formulation (2) is extended to propose block mixture model defined by the following probability density function f (x; θ) = p(u; θ)f (x|u; θ) where U denotes the set of all possible labellings of I × J and θ contains all the unknown parameters of this model
Summary
Cluster analysis is an important tool in a variety of scientific areas such as pattern recognition, information retrieval, micro-array, data mining, and so forth. [Dhillon (2001)] published an article on text data mining by simultaneously clustering the documents and content (words) using bipartite spectral graph partitioning. This is quite useful technique for instance to manage huge corpus of unlabeled documents. The R package blockcluster allows to estimate the parameters of the co-clustering models [Govaert and Nadif (2003)] for binary, contingency and continuous data This package is unique from the point of view of generative models it implements (latent blocks), the used algorithms (BEM, BCEM) and, apart from that, special attention has been given to design the library for handling very huge data sets in reasonable time.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.