Abstract

BNPmix is an R package for Bayesian nonparametric multivariate density estimation, clustering, and regression, using Pitman-Yor mixture models, a flexible and robust generalization of the popular class of Dirichlet process mixture models. A variety of model specifications and state-of-the-art posterior samplers are implemented. In order to achieve computational efficiency, all sampling methods are written in C++ and seamless integrated into R by means of the Rcpp and RcppArmadillo packages. BNPmix exploits the ggplot2 capabilities and implements a series of generic functions to plot and print summaries of posterior densities and induced clustering of the data.

Highlights

  • Bayesian nonparametric (BNP) methods provide flexible solutions to complex problems and data which are not described by parametric models (Hjort, Holmes, Müller, and Walker 2010; Müller, Quintana, Jara, and Hanson 2015)

  • In order to clarify which features are specific to BNPmix and which are shared by other packages, we review state-of-the-art R packages for BNP inference via Markov chain Monte Carlo (MCMC)

  • The BNPmix package consists of three main R functions, wrappers of C++ routines which implement the BNP models described in Section 2 and the MCMC simulation methods introduced in Section 3, along with some user-friendly functions which facilitate the elicitation of prior distributions and the post-processing of generated posterior samples

Read more

Summary

Introduction

Bayesian nonparametric (BNP) methods provide flexible solutions to complex problems and data which are not described by parametric models (Hjort, Holmes, Müller, and Walker 2010; Müller, Quintana, Jara, and Hanson 2015). The DPpackage by Jara, Hanson, Quintana, Müller, and Rosner (2011) is probably the most comprehensive of the packages we considered It is mainly written in Fortran and consists of a rich collection of functions implementing some of the most successful Bayesian nonparametric and semi-parametric models, including DP and dependent Dirichlet process (DDP) mixtures, hierarchical DP, Pólya trees, and random Bernstein polynomials. At the same time, when the focus is on the use of the PY process, BNPmix plays a leading role It is worth mentioning the increasing attention recently dedicated by the BNP literature to variational methods approximating the posterior distribution (Blei and Jordan 2006; Hughes, Kim, and Sudderth 2015; Campbell, Straub, Fisher III, and How 2015; Tank, Foti, and Fox 2015): the availability of R packages implementing such approach for BNP models is rather limited though, a notable exception being the package MixDir (Ahlmann-Eltze and Yau 2018) which implements a hierarchical DP mixture of multinomial kernels. A further comparison with other R packages for BNP inference and technical details on the parametrization of the implemented models are provided in the appendix

Model specifications
Posterior simulation methods
Package implementation
Low-level implementation
Wrappers to the main functions
Package scalability
Usage of the package
Univariate density estimation
Multivariate density estimation
Density regression
Density estimation for correlated samples
Packages comparison
Findings
Base measures and hyperdistributions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call