Appropriate reduction of the posterior distribution in fully Bayesian inversions

D Sato,Yukitoshi Fukahata,Yohei Nozue

doi:10.1093/gji/ggac231

Abstract

SummaryBayesian inversion generates a posterior distribution of model parameters from an observation equation and prior information both weighted by hyperparameters. The prior is also introduced for the hyperparameters in fully Bayesian inversions and enables us to evaluate both the model parameters and hyperparameters probabilistically by the joint posterior. However, even in a linear inverse problem, it is unsolved how we should extract useful information on the model parameters from the joint posterior. This study presents a theoretical exploration into the appropriate dimensionality reduction of the joint posterior in the fully Bayesian inversion. We classify the ways of probability reduction into the following three categories focused on the marginalization of the joint posterior: (1) using the joint posterior without marginalization, (2) using the marginal posterior of the model parameters and (3) using the marginal posterior of the hyperparameters. First, we derive several analytical results that characterize these categories. One is a suite of semi-analytic representations of the probability maximization estimators for respective categories in the linear inverse problem. The mode estimators of categories (1) and (2) are found asymptotically identical for a large number of data and model parameters. We also prove the asymptotic distributions of categories (2) and (3) delta-functionally concentrate on their probability peaks, which predicts two distinct optimal estimates of the model parameters. Secondly, we conduct a synthetic test and find an appropriate reduction is realized by category (3), typified by Akaike’s Bayesian information criterion. The other reduction categories are shown inappropriate for the case of many model parameters, where the probability concentration of the marginal posterior of the model parameters no longer implies the central limit theorem. The main cause of these results is that the joint posterior peaks sharply at an underfitted or overfitted solution as the number of model parameters increases. The exponential growth of the probability space in the model-parameter dimension makes almost-zero-probability events finitely contribute to the posterior mean and distributions of categories (1) and (2) be pathological. One remedy for this pathology is counting all model-parameter realizations by integrating the joint posterior over the model-parameter space of exponential multiplicity. Hence, the marginal posterior of the hyperparameters for categories (3) becomes appropriate and can conform to the law of large numbers even with numerous model parameters. The exponential rarity of the posterior mean and ABIC estimates implies the exponential time complexity of ordinary Monte Carlo methods in population mean and ABIC computations. We also present a geophysical application to estimate a continuous strain-rate field from spatially discrete global navigation satellite system data, demonstrating denser basis function expansions of the model-parameter field lead to oversmoothed estimates in naive fully Bayesian approaches, while detailed fields are resolved with convergence by the reduction of category (3). We often naively believe a good solution can be constructed from a finite number of samples with high probabilities, but the high-probability domain could be inappropriate, and exponentially many samples become necessary for generating appropriate estimates in the high-dimensional fully Bayesian posterior probability space.

Full Text