Abstract
Block clustering (or co-clustering or simultaneous clustering) aims at simultaneously partitioning the rows and columns of a data table to reveal homogeneous block structures. This structure can stem from the latent block model which provides a probabilistic modelling of data tables whose block patterns are defined from the row and column classes. For continuous data, each table entry is typically assumed to follow a Gaussian distribution whose parameters are common to all entries belonging to the same block, that is, sharing the same row and column classes. For a given data table, several candidate models are usually examined: they may differ in the numbers of clusters or more generally in the number of free parameters of the model. Model selection then becomes a critical issue, for which the tools that have been derived for model-based one-way clustering need to be adapted. We develop here a criterion based on an approximation of the Integrated Classification Likelihood (ICL) of block models, and propose a BIC-like variant following a similar form. The proposed criteria are assessed on simulated data, where their performances are shown to be fairly reliable for medium to large data tables with well-separated clusters.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.