Fast Search and Estimation of Bayesian Nonparametric Mixture Models Using a Classification Annealing EM Algorithm

George Karabatsos

doi:10.1080/10618600.2020.1807995

George Karabatsos

https://doi.org/10.1080/10618600.2020.1807995

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Bayesian nonparametric (BNP) infinite-mixture models provide flexible and accurate density estimation, cluster analysis, and regression. However, for the posterior inference of such a model, MCMC algorithms are complex, often need to be tailor-made for different BNP priors, and are intractable for large datasets. We introduce a BNP classification annealing EM algorithm which employs importance sampling estimation. This new fast-search algorithm, for virtually any given BNP mixture model, can quickly and accurately calculate the posterior predictive density estimate (by posterior averaging) and the maximum a-posteriori clustering estimate (by simulated annealing), even for datasets containing millions of observations. The algorithm can handle a wide range of BNP priors because it primarily relies on the ability to generate prior samples. The algorithm can be fast because in each iteration, it performs a sampling step for the (missing) clustering of the data points, instead of a costly E-step; and then performs direct posterior calculations in the M-step, given the sampled (imputed) clustering. The new algorithm is illustrated and evaluated through BNP Gaussian mixture model analyses of benchmark simulated data and real datasets. MATLAB code for the new algorithm is provided in the supplementary materials. Supplementary materials for this article are available online.

Full Text