Abstract

In this work we propose a novel model prior for variable selection in linear regression. The idea is to determine the prior mass by considering the worth of each of the regression models, given the number of possible covariates under consideration. The worth of a model consists of the information loss and the loss due to model complexity. While the information loss is determined objectively, the loss expression due to model complexity is flexible and, the penalty on model size can be even customized to include some prior knowledge. Some versions of the loss-based prior are proposed and compared empirically. Through simulation studies and real data analyses, we compare the proposed prior to the Scott and Berger prior, for noninformative scenarios, and with the Beta-Binomial prior, for informative scenarios.

Highlights

  • In this paper, we propose a method to derive model prior probabilities for variable selection problems in linear regression

  • With a prior distribution on the space of models, representing the model uncertainty related to variable selection, one way to proceed is by using Bayesian model averaging (Hoeting et al, 1999)

  • 534A Loss-Based Prior for Variable Selection in Linear Regression Methods model posterior distribution tends to be spread across many of the possible regression models, and when prediction is an important part of the statistical analysis, Raftery et al (1997) show that Bayesian model averaging performs better than choosing the regression model with the highest posterior probability

Read more

Summary

Introduction

We propose a method to derive model prior probabilities for variable selection problems in linear regression. 534A Loss-Based Prior for Variable Selection in Linear Regression Methods model posterior distribution tends to be spread across many of the possible regression models, and when prediction is an important part of the statistical analysis, Raftery et al (1997) show that Bayesian model averaging performs better than choosing the regression model with the highest posterior probability. The fact that a regression model has been chosen to be part of the model space (i) conveys information and (ii) induces complexity; as such, we can measure the loss in information carried by a model and the loss due to its complexity These losses will form the basis to determine the worth of the model and the model prior probability.

Notation and problem specification
Model priors in objective variable selection
Model prior based on losses
Setting the constant c
Simulation study
Non-informative simulation
Informative simulation
Illustrative examples with real data sets
Hald data
Large data set analysis
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call