Abstract

Bayesian nonparametric (BNP) infinite-mixture models provide flexible and accurate density estimation, cluster analysis, and regression. However, for the posterior inference of such a model, MCMC algorithms are complex, often need to be tailor-made for different BNP priors, and are intractable for large datasets. We introduce a BNP classification annealing EM algorithm which employs importance sampling estimation. This new fast-search algorithm, for virtually any given BNP mixture model, can quickly and accurately calculate the posterior predictive density estimate (by posterior averaging) and the maximum a-posteriori clustering estimate (by simulated annealing), even for datasets containing millions of observations. The algorithm can handle a wide range of BNP priors because it primarily relies on the ability to generate prior samples. The algorithm can be fast because in each iteration, it performs a sampling step for the (missing) clustering of the data points, instead of a costly E-step; and then performs direct posterior calculations in the M-step, given the sampled (imputed) clustering. The new algorithm is illustrated and evaluated through BNP Gaussian mixture model analyses of benchmark simulated data and real datasets. MATLAB code for the new algorithm is provided in the supplementary materials. Supplementary materials for this article are available online.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call