Abstract

Dirichlet Process (DP) is commonly used as a non-parametric prior on mixture models. It has adaptive model selection capability which is useful in clustering applications. Although exact inference is not tractable for this prior, Markov Chain Monte Carlo (MCMC) samplers have been used to approximate the target posterior distribution. These samplers often do not scale well. Thus, recent studies focused on improving run-time efficiency through parallelization. In this paper, we introduce a new sampling method for DP by combining Chinese Restaurant Process (CRP) with stick-breaking construction allowing for parallelization through conditional independence at the data point level. Stick breaking part uses an uncollapsed sampler providing a high level of parallelization while the CRP part uses collapsed sampler allowing more accurate clustering. We show that this partially collapsed Gibbs sampler has significant advantages over the collapsed-only version in terms of scalability. We also provide results on real-world data sets that favorably compares the proposed inference algorithm against a recently introduced parallel Dirichlet Process samplers in terms of F1 scores while maintaining a comparable run-time performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call