Abstract
Two methods for clustering data and choosing a mixture model are proposed. First, we derive a new classification algorithm based on the classification likelihood. Then, the likelihood conditional on these clusters is written as the product of likelihoods of each cluster, and AIC- respectively BIC-type approximations are applied. The resulting criteria turn out to be the sum of the AIC or BIC relative to each cluster plus an entropy term. The performance of our methods is evaluated by Monte-Carlo methods and on a real data set, showing in particular that the iterative estimation algorithm converges quickly in general, and thus the computational load is rather low.
Highlights
Because of their ability to represent relationships in data, finite mixture models are commonly used for summarizing distributions
A penalization is provided in a Bayesian framework by [4], who proposed a criterion based on the integrated completed likelihood (ICL)
We propose two alternative approaches, based on the AIC and BIC criteria applied to the classification likelihood
Summary
Because of their ability to represent relationships in data, finite mixture models are commonly used for summarizing distributions. The AIC [1, 2] and the BIC [26] criteria are based on such likelihoods, as well as the algorithm provided by [11] for estimating a mixture model. For assessing the number of clusters arising from a Gaussian mixture model, [5, 6] used a penalized completed likelihood (CL). A penalization is provided in a Bayesian framework by [4], who proposed a criterion based on the integrated completed likelihood (ICL). Their method consists in approximating the integrated completed likelihood by the BIC. We propose two alternative approaches, based on the AIC and BIC criteria applied to the classification likelihood.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have