Characterizing the Distribution of Heterogeneity: Distributed Markov Chain Monte Carlo for Bayesian Hierarchical Models

Federico (Rico) Bumbaca

doi:10.2139/ssrn.3866257

Abstract

Abstract This article proposes a distributed Markov chain Monte Carlo (MCMC) algorithm for estimating Bayesian hierarchical models when the panel size is extremely large (in the millions of consumers) and the objects of interest are the distribution of heterogeneity and the parameters that characterize it. Extant distributed MCMC methods are inherently inefficient, statistically and computationally, because they require the estimation of both the consumer-level parameters and the distribution of heterogeneity. The approach we present bypasses the estimation of the consumer-level parameters. The two-stage algorithm is asymptotically exact, has excellent variance properties, retains the flexibility of a standard MCMC algorithm, and is easy to implement. The details of the algorithm depend on the form of the prior imposed on the hierarchical model. All three possibilities for the prior are considered: i) nonparametric, ii) exponential family, and iii) nonexponential family, such as a finite mixture. The first stage constructs an estimator of the posterior predictive distribution of the consumer-level parameters, which is also the distribution of heterogeneity. For a nonparametric prior, a second stage is not needed since, by definition, the common parameters that characterize the distribution of heterogeneity are already known. For the two parametric priors (exponential and nonexponential families) for which the common parameters that characterize the distribution of heterogeneity are desired, the second stage draws auxiliary variables from the posterior predictive distribution before directly drawing the common parameters. The proposed algorithm takes particular advantage of exponential family priors by first reducing the auxiliary variables to the sufficient statistics that parameterize the posterior distribution of heterogeneity before drawing the common parameters. Although both stages are embarrassingly parallel, the second stage is sufficiently fast that a serial implementation may be computationally tractable. By avoiding the extensive computational, memory and network resources related to drawing, storing and communicating consumer-level parameters, the algorithm dominates the single-machine benchmark algorithm in computational and statistical efficiency by several orders of magnitude.

Full Text