Abstract
In this work we propose a novel model prior for variable selection in linear regression. The idea is to determine the prior mass by considering the worth of each of the regression models, given the number of possible covariates under consideration. The worth of a model consists of the information loss and the loss due to model complexity. While the information loss is determined objectively, the loss expression due to model complexity is flexible and, the penalty on model size can be even customized to include some prior knowledge. Some versions of the loss-based prior are proposed and compared empirically. Through simulation studies and real data analyses, we compare the proposed prior to the Scott and Berger prior, for noninformative scenarios, and with the Beta-Binomial prior, for informative scenarios.
Highlights
In this paper, we propose a method to derive model prior probabilities for variable selection problems in linear regression
With a prior distribution on the space of models, representing the model uncertainty related to variable selection, one way to proceed is by using Bayesian model averaging (Hoeting et al, 1999)
534A Loss-Based Prior for Variable Selection in Linear Regression Methods model posterior distribution tends to be spread across many of the possible regression models, and when prediction is an important part of the statistical analysis, Raftery et al (1997) show that Bayesian model averaging performs better than choosing the regression model with the highest posterior probability
Summary
We propose a method to derive model prior probabilities for variable selection problems in linear regression. 534A Loss-Based Prior for Variable Selection in Linear Regression Methods model posterior distribution tends to be spread across many of the possible regression models, and when prediction is an important part of the statistical analysis, Raftery et al (1997) show that Bayesian model averaging performs better than choosing the regression model with the highest posterior probability. The fact that a regression model has been chosen to be part of the model space (i) conveys information and (ii) induces complexity; as such, we can measure the loss in information carried by a model and the loss due to its complexity These losses will form the basis to determine the worth of the model and the model prior probability.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have