Scalable Bayes under Informative Sampling

Terrance D Savitsky,Sanvesh Srivastava

doi:10.1111/sjos.12312

Abstract

AbstractBayesian hierarchical formulations are utilized by the U.S. Bureau of Labor Statistics (BLS) with respondent‐level data for missing item imputation because these formulations are readily parameterized to capture correlation structures. BLS collects survey data under informative sampling designs that assign probabilities of inclusion to be correlated with the response on which sampling‐weighted pseudo posterior distributions are estimated for asymptotically unbiased inference about population model parameters. Computation is expensive and does not support BLS production schedules. We propose a new method to scale the computation that divides the data into smaller subsets, estimates a sampling‐weighted pseudo posterior distribution, in parallel, for every subset and combines the pseudo posterior parameter samples from all the subsets through their mean in the Wasserstein space of order 2. We construct conditions on a class of sampling designs where posterior consistency of the proposed method is achieved. We demonstrate on both synthetic data and in application to the Current Employment Statistics survey that our method produces results of similar accuracy as the usual approach while offering substantially faster computation.

Full Text