Post-inference Methods for Scalable Probabilistic Modeling and Sequential Decision Making

William Neiswanger

doi:10.1184/r1/11898237.v1

Abstract

Probabilistic modeling refers to a set of techniques for modeling data that allows one to specify assumptions about the processes that generate data, incorporate prior beliefsabout models, and infer properties of these models given observed data. Benefits include uncertainty quantification, multiple plausible solutions, reduction of overfitting,better performance given small data or large models, and explicit incorporation of a priori knowledge and problem structure. In recent decades, an array of inference algorithms have been developed to estimate these models.This thesis focuses on post-inference methods, which are procedures that can be applied after the completion of standard inference algorithms to allow for increasedefficiency, accuracy, or parallelism when learning probabilistic models of big data sets. These methods also allow for scalable computation in distributed or onlinesettings, incorporation of complex prior information, and better use of inference results in downstream tasks. A few examples include:• Embarrassingly parallel inference. Large data sets are often distributed over a collection of machines. We first compute an inference result (e.g. with Markov chain Monte Carlo or variational inference) on each machine, in parallel, without communication between machines. Afterwards, we combine the results to yield an inference result for the full data set.• Prior swapping. Certain model priors limit the number of applicable inference algorithms, or increase their computational cost. We first choose any “convenientprior” (e.g. a conjugate prior, or a prior that allows for computationally cheap inference), and compute an inference result. Afterwards, we use this result to efficiently perform inference with other, more sophisticated priors orregularizers.• Sequential decision making and optimization. Model-based sequential decision making and optimization methods use models to define acquisition functions. We compute acquisition functions using the inference result from anyprobabilistic program or model framework, and perform efficient inference in sequential settings. We also describe the benefits of combining the above methods, present methodology for applying the embarrassingly parallel procedures when the number of machines is dynamic or unknown at inference time, illustrate how these methods can be applied for spatiotemporal analysis and in covariate dependent models, show ways tooptimize these methods by incorporating test-functions of interest, and demonstrate how these methods can be implemented in probabilistic programming frameworksfor automatic deployment.

Full Text