BayesBinMix: an R Package for Model Based Clustering of Multivariate Binary Data

Panagiotis Papastamoulis,Magnus Rattray

doi:10.32614/rj-2017-022

Abstract

The BayesBinMix package offers a Bayesian framework for clustering binary data with or without missing values by fitting mixtures of multivariate Bernoulli distributions with an unknown number of components. It allows the joint estimation of the number of clusters and model parameters using Markov chain Monte Carlo sampling. Heated chains are run in parallel and accelerate the convergence to the target posterior distribution. Identifiability issues are addressed by implementing label switching algorithms. The package is demonstrated and benchmarked against the Expectation-Maximization algorithm using a simulation study as well as a real dataset.

Highlights

Clustering data is a fundamental task in a wide range of applications and finite mixture models are widely used for this purpose (McLachlan and Peel, 2000; Marin et al, 2005; Frühwirth-Schnatter, 2006)
The likelihood surface of a mixture model can exhibit many local maxima and it is well known that the EM algorithm may fail to converge to the main mode if it is initialized from a point close to a minor mode
This function takes as input a binary data array and runs the allocation sampler for a series of heated chains which run in parallel while swaps between pairs of chains are proposed

Summary

Introduction

Clustering data is a fundamental task in a wide range of applications and finite mixture models are widely used for this purpose (McLachlan and Peel, 2000; Marin et al, 2005; Frühwirth-Schnatter, 2006). The Bayesian framework allows to put a prior distribution on both the number of clusters as well as the model parameters and (approximately) sample from the joint posterior distribution using Markov chain Monte Carlo (MCMC) algorithms (Richardson and Green, 1997; Stephens, 2000a; Nobile and Fearnside, 2007; White et al, 2016) This does not mean that the Bayesian approach is not problematic. We use the allocation sampler (Nobile and Fearnside, 2007) which introduced this sampling scheme for parametric families such that conjugate prior distributions exist and applied it in the specific context of mixtures of normal distributions This approach was recently followed by White et al (2016) which allowed for variable selection. (a) Propose reallocation of observations assigned to the ejecting component between itself and the ejected component according to the Beta(α , α ) distribution

If an absorption is attempted:

Quantiles for each variable

Method

Summary and remarks

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The R Journal	Publication Date: Jan 1, 2017
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

BayesBinMix: an R Package for Model Based Clustering of Multivariate Binary Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The R Journal

Lead the way for us

Similar Papers

Overfitting Bayesian Mixture Models with an Unknown Number of Components.
Zoé Van Havre ... Judith Rousseau
PLOS ONE | VOL. 10
Zoé Van Havre, et. al.Zoé Van Havre ... Judith Rousseau
15 Jul 2015
PLOS ONE | VOL. 10

Clustering multivariate data using factor analytic Bayesian mixtures with an unknown number of components
Panagiotis Papastamoulis
Statistics and Computing | VOL. 30
Panagiotis PapastamoulisPanagiotis Papastamoulis
27 Aug 2019
Statistics and Computing | VOL. 30

An effective EM algorithm for mixtures of Gaussian processes via the MCMC sampling and approximation
Di Wu ... Jinwen Ma
Neurocomputing | VOL. 331
Di Wu, et. al.Di Wu ... Jinwen Ma
23 Nov 2018
Neurocomputing | VOL. 331

Perfect Simulation for Mixtures with Known and Unknown Number of Components
Sabyasachi Mukhopadhyay ... Sourabh Bhattacharya
Bayesian Analysis | VOL. 7
Sabyasachi Mukhopadhyay, et. al.Sabyasachi Mukhopadhyay ... Sourabh Bhattacharya
01 Sep 2012
Bayesian Analysis | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BayesBinMix: an R Package for Model Based Clustering of Multivariate Binary Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The R Journal