Abstract

Two methods for clustering data and choosing a mixture model are proposed. First, we derive a new classification algorithm based on the classification likelihood. Then, the likelihood conditional on these clusters is written as the product of likelihoods of each cluster, and AIC- respectively BIC-type approximations are applied. The resulting criteria turn out to be the sum of the AIC or BIC relative to each cluster plus an entropy term. The performance of our methods is evaluated by Monte-Carlo methods and on a real data set, showing in particular that the iterative estimation algorithm converges quickly in general, and thus the computational load is rather low.

Highlights

  • Because of their ability to represent relationships in data, finite mixture models are commonly used for summarizing distributions

  • A penalization is provided in a Bayesian framework by [4], who proposed a criterion based on the integrated completed likelihood (ICL)

  • We propose two alternative approaches, based on the AIC and BIC criteria applied to the classification likelihood

Read more

Summary

Introduction

Because of their ability to represent relationships in data, finite mixture models are commonly used for summarizing distributions. The AIC [1, 2] and the BIC [26] criteria are based on such likelihoods, as well as the algorithm provided by [11] for estimating a mixture model. For assessing the number of clusters arising from a Gaussian mixture model, [5, 6] used a penalized completed likelihood (CL). A penalization is provided in a Bayesian framework by [4], who proposed a criterion based on the integrated completed likelihood (ICL). Their method consists in approximating the integrated completed likelihood by the BIC. We propose two alternative approaches, based on the AIC and BIC criteria applied to the classification likelihood.

Model-Based Clustering
Existing Methods
Some New Approaches
Numerical Examples
Old Faithful Geyser
Monte Carlo Experiment 0
Monte Carlo Experiment 1
Monte Carlo Experiment 2
Classification Performance
Findings
Discussion and Concluding
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call