Abstract

We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting. We use the standard linear model, which assumes homoskedasticity, whereas the data are heteroskedastic (though, significantly, there are no outliers). As sample size increases, the posterior puts its mass on worse and worse models of ever higher dimension. This is caused by hypercompression, the phenomenon that the posterior puts its mass on distributions that have much larger KL divergence from the ground truth than their average, i.e. the Bayes predictive distribution. To remedy the problem, we equip the likelihood in Bayes' theorem with an exponent called the learning rate, and we propose the SafeBayesian method to learn the learning rate from the data. SafeBayes tends to select small learning rates, and regularizes more, as soon as hypercompression takes place. Its results on our data are quite encouraging.

Highlights

  • We empirically demonstrate a form of inconsistency of Bayes factor model selection, model averaging and Bayesian ridge regression under model misspecification on a simple linear regression problem with random design

  • While our experiments focus on linear regression, the discussion holds for general conditional density models

  • Secondary conclusions We see that both types of SafeBayes converge quickly to the right model order, which is pleasing since they were not designed to achieve this

Read more

Summary

Introduction

We empirically demonstrate a form of inconsistency of Bayes factor model selection, model averaging and Bayesian ridge regression under model misspecification on a simple linear regression problem with random design. We show empirically that SafeBayes performs excellently in our regression setting, being competitive with standard Bayes if the model is correct and very significantly outperforming standard Bayes if it is not We do this by providing a wide range of experiments, varying parameters of the problem such as the priors and the true regression function and studying various performance indicators such as the squared error risk, the posterior on the variance etc. Both our experiments (as e.g. in Figure 2) and the implementation details of SafeBayes suggest a predictive–sequential setting, our results are just as relevant for the nonsequential setting of fixed-sample size linear regression with random design, which is a standard statistical problem In such settings, one would like to have guarantees which, for the fixed, given sample size n, give some indication as to how ‘close’ our inferred distribution or parameter vector is from some ‘true’ or optimal vector. We discuss related work, pose several Open Problems and tentatively propose a generic theory of (pseudo-Bayesian) inference of misspecification, parts of which have already been developed in the companion papers Grunwald and Mehta (2016) and Grunwald (2017)

Preliminaries
KL-optimal distribution
A special case
KL-associated prediction tasks for the linear model
The generalized posterior
Instantiating generalized Bayes to linear model selection and averaging
Bayesian inconsistency from bad misspecification
Preparation
Hypercompression
How η-generalized Bayes for η 1 can avoid bad misspecification
The SafeBayesian algorithm and How it finds the right η
Main experiment
The statistics we report
Conclusion
Experimental demonstration of hypercompression for standard Bayes
Second experiment
Findings
Executive summary
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call