Estimation and model selection for model-based clustering with the conditional classification likelihood

Jean-Patrick Baudry

doi:10.1214/15-ejs1026

Abstract

The Integrated Completed Likelihood (ICL) criterion was introduced by Biernacki, Celeux and Govaert (2000) in the model-based clustering framework to select a relevant number of classes and has been used by statisticians in various application areas. A theoretical study of ICL is proposed. A contrast related to the clustering objective is introduced: the conditional classification likelihood. An estimator and model selection criteria are deduced. The properties of these new procedures are studied and ICL is proved to be an approximation of one of these criteria. We contrast these results with the current leading point of view about ICL, that it would not be consistent. Moreover these results give insights into the class notion underlying ICL and feed a reflection on the class notion in clustering. General results on penalized minimum contrast criteria and upper-bounds of the bracketing entropy in parametric situations are derived, which can be useful per se. Practical solutions for the computation of the introduced procedures are proposed, notably an adapted EM algorithm and a new initialization method for EM-like algorithms which helps to improve the estimation in Gaussian mixture models.

Highlights

Model-based clustering is introduced in Sections 1.1 and 1.2
The main topic of this work is the choice of the number of classes in a model-based clustering framework, and the choice of the number of components of a Gaussian mixture
Even for data arising from a mixture distribution, a relevant number of classes may differ from the true number of components of the mixture

Summary

Introduction

The main topic of this work is the choice of the number of classes in a model-based clustering framework, and the choice of the number of components of a Gaussian mixture. We prove that it is a penalized contrast criterion with a criterion which is different from the standard likelihood: this justifies why this is not surprising, nor a drawback, that ICL does not asymptotically select the “true” number of components, even when the “true” model is considered. The reason why we introduce this new contrast Lcc (Section 2.1) is not that we believe it a priori to be the better one for a clustering purpose, but rather that it enables to theoretically study and understand ICL.

Gaussian Mixture Models

Model-Based Clustering

A New Contrast

Estimation

Bracketing Entropy and Glivenko-Cantelli Property

Model Selection

Consistent Penalized Criteria

A New Light on ICL

Discussion

Proofs

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2015
Citations: 42	License type: cc-by

R Discovery Prime

R Discovery Prime

Estimation and model selection for model-based clustering with the conditional classification likelihood

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Similar Papers

Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion
Marco Bertoletti ... Riccardo Rastelli
METRON | VOL. 73
Marco Bertoletti, et. al.Marco Bertoletti ... Riccardo Rastelli
19 May 2015
METRON | VOL. 73

Extending mixtures of multivariate t-factor analyzers
Jeffrey L Andrews ... Paul D Mcnicholas
Statistics and Computing | VOL. 21
Jeffrey L Andrews, et. al.Jeffrey L Andrews ... Paul D Mcnicholas
10 Apr 2010
Statistics and Computing | VOL. 21

Clinical phenotyping in sarcoidosis using cluster analysis
Nancy W Lin ... Jaron Arbet
Respiratory Research | VOL. 23
Nancy W Lin, et. al.Nancy W Lin ... Jaron Arbet
09 Apr 2022
Respiratory Research | VOL. 23

Flexible mixture modelling with the polynomial Gaussian cluster-weighted model
Antonio Punzo
Statistical Modelling | VOL. 14
Antonio PunzoAntonio Punzo
14 May 2014
Statistical Modelling | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Estimation and model selection for model-based clustering with the conditional classification likelihood

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics