Abstract

In data science, an unknown information source is estimated by a predictive distribution defined from a statistical model and a prior. In an older Bayesian framework, it was explained that the Bayesian predictive distribution should be the best on the assumption that a statistical model is convinced to be correct and a prior is given by a subjective belief in a small world. However, such a restricted treatment of Bayesian inference cannot be applied to highly complicated statistical models and learning machines in a large world. In 1980, a new scientific paradigm of Bayesian inference was proposed by Akaike, in which both a model and a prior are candidate systems and they had better be designed by mathematical procedures so that the predictive distribution is the better approximation of unknown information source. Nowadays, Akaike’s proposal is widely accepted in statistics, data science, and machine learning. In this paper, in order to establish a mathematical foundation for developing a measure of a statistical model and a prior, we show the relation among the generalization loss, the information criteria, and the cross-validation loss, then compare them from three different points of view. First, their performances are compared in singular problems where the posterior distribution is far from any normal distribution. Second, they are studied in the case when a leverage sample point is contained in data. And last, their stochastic properties are clarified when they are used for the prior optimization problem. The mathematical and experimental comparison shows the equivalence and the difference among them, which we expect useful in practical applications.

Highlights

  • In data science, we estimate an unknown information source by a predictive distribution defined from a statistical model and a prior

  • In an older framework of Bayesian statistics in the 20th century, it was assumed that a statistical model is convinced to be correct and a prior is given by a subjective belief, resulting that the predictive distribution is believed to be subjectively optimal solution without any check or test

  • A new paradigm was already proposed by Akaike (1980a, 1980b) that both a statistical model and a prior are understood as only candidate systems and they had better be optimized so that the predictive distributions are controlled to be the better approximations of the unknown probability distribution

Read more

Summary

Introduction

We estimate an unknown information source by a predictive distribution defined from a statistical model and a prior. To develop the scientific evaluation measures of both a statistical model and a prior, we study the mathematical relation among the generalization loss, the information criteria AIC by Akaike (1974), DIC by Spiegelhalter et al (2002), WAIC by Watanabe (2010), and the leave-one-out cross-validation loss by Gelfand et al (1992); Vehtari and Lampinen (2002) in Bayesian inference from the following three points of view They are compared in the singular condition that the posterior distribution can not be approximated by any normal distribution. Their statistical difference is discussed, and in the last section, we conclude the paper

Definitions of statistical inference
Asymptotic generalization loss
Information criteria and cross validation
Regular and singular cases
Influential observation
Prior optimization problem
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call