Abstract

A nonparametric Bayes procedure is proposed for testing the fit of a parametric model for a distribution. Alternatives to the parametric model are kernel density estimates. Data splitting makes it possible to use kernel estimates for this purpose in a Bayesian setting. A kernel estimate indexed by bandwidth is computed from one part of the data, a training set, and then used as a model for the rest of the data, a validation set. A Bayes factor is calculated from the validation set by comparing the marginal for the kernel model with the marginal for the parametric model of interest. A simulation study is used to investigate how large the training set should be, and examples involving astronomy and wind data are provided. A proof of Bayes consistency of the proposed test is also provided.

Highlights

  • Nonparametric testing of the t of a parametric model for a distribution has a long and rich history in frequentist statistics; see, e.g., Rayner et al (2009).the literature on Bayesian goodness-of-t tests is much smaller

  • The purpose of the current paper is to introduce a Bayesian approach to goodness of t that has the virtues of (i) simplicity, and (ii) transparency to users unfamiliar with the somewhat daunting notions of Dirichlet processes and Pólya trees

  • It is noteworthy that under the null hypothesis our Bayes factor converges to 0 at a rate of exp(−P nη) for some 1/2 < η < 1. This is in contrast to typical results when testing a parametric null hypothesis against a nonparametric alternative, where under the null hypothesis the Bayes factor converges to 0 at slower than an exponential rate. (See, for example, McVinish et al (2009).) The reason for exponential convergence in our case is the ineciency of the kernel estimate relative to the parametric model under H0

Read more

Summary

Introduction

Nonparametric testing of the t of a parametric model for a distribution has a long and rich history in frequentist statistics; see, e.g., Rayner et al (2009). Given a tted parametric density, it seems natural to test the t of this density by seeing how close it is to a kernel estimate. Such an approach in the frequentist realm dates at least to Bickel and Rosenblatt (1973). The Bayesian approach requires models for the underlying density that are well dened prior to data collection. If the null hypothesis is rejected in our approach, one may consider the kernel density estimate that led to this conclusion to seek guidance as to an appropriate parametric model for the underlying density. A proof of our theorem may be found in the Appendix

The method
Prior for the bandwidth
Choice of training set size
Analysis of planetary nebula luminosity data
Wind direction data
Bayes consistency
Concluding remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call