On Choosing a Mixture Model for Clustering

Joseph Ngatchou-Wandji,Jan Bulla

doi:10.6339/jds.2013.11(1).1135

Abstract

Two methods for clustering data and choosing a mixture model are proposed. First, we derive a new classification algorithm based on the classification likelihood. Then, the likelihood conditional on these clusters is written as the product of likelihoods of each cluster, and AIC- respectively BIC-type approximations are applied. The resulting criteria turn out to be the sum of the AIC or BIC relative to each cluster plus an entropy term. The performance of our methods is evaluated by Monte-Carlo methods and on a real data set, showing in particular that the iterative estimation algorithm converges quickly in general, and thus the computational load is rather low.

Highlights

Because of their ability to represent relationships in data, finite mixture models are commonly used for summarizing distributions
A penalization is provided in a Bayesian framework by [4], who proposed a criterion based on the integrated completed likelihood (ICL)
We propose two alternative approaches, based on the AIC and BIC criteria applied to the classification likelihood

Summary

Introduction

Because of their ability to represent relationships in data, finite mixture models are commonly used for summarizing distributions. The AIC [1, 2] and the BIC [26] criteria are based on such likelihoods, as well as the algorithm provided by [11] for estimating a mixture model. For assessing the number of clusters arising from a Gaussian mixture model, [5, 6] used a penalized completed likelihood (CL). A penalization is provided in a Bayesian framework by [4], who proposed a criterion based on the integrated completed likelihood (ICL). Their method consists in approximating the integrated completed likelihood by the BIC. We propose two alternative approaches, based on the AIC and BIC criteria applied to the classification likelihood.

Model-Based Clustering

Existing Methods

Some New Approaches

Numerical Examples

Old Faithful Geyser

Monte Carlo Experiment 0

Monte Carlo Experiment 1

Monte Carlo Experiment 2

Classification Performance

Findings

Discussion and Concluding

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Data Science	Publication Date: Jul 30, 2021
Citations: 6	License type: cc-by

R Discovery Prime

R Discovery Prime

On Choosing a Mixture Model for Clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Science

Lead the way for us

Similar Papers

Author response: Limitations of principal components in quantitative genetic association models for human studies
Yiqi Yao ... Alejandro Ochoa
-
Yiqi Yao, et. al.Yiqi Yao ... Alejandro Ochoa
25 Apr 2023
25 Apr 2023

Decision letter: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg ... Detlef Weigel
-
Magnus Nordborg, et. al.Magnus Nordborg ... Detlef Weigel
04 Jul 2022
04 Jul 2022

Editor's evaluation: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg
-
Magnus NordborgMagnus Nordborg
04 Jul 2022
04 Jul 2022

Real and synthetic data sets for benchmarking key-value stores focusing on various data types and sizes
Hyuk-Yoon Kwon
Data in Brief | VOL. 30
Hyuk-Yoon KwonHyuk-Yoon Kwon
20 Mar 2020
Data in Brief | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On Choosing a Mixture Model for Clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Science