Abstract

We address the problem of Bayesian variable selection for high-dimensional linear regression. We consider a generative model that uses a spike-and-slab-like prior distribution obtained by multiplying a deterministic binary vector, which traduces the sparsity of the problem, with a random Gaussian parameter vector. The originality of the work is to consider inference through relaxing the model and using a type-II log-likelihood maximization based on an EM algorithm. Model selection is performed afterwards relying on Occam’s razor and on a path of models found by the EM algorithm. Numerical comparisons between our method, called spinyReg, and state-of-the-art high-dimensional variable selection algorithms (such as lasso, adaptive lasso, stability selection or spike-and-slab procedures) are reported. Competitive variable selection results and predictive performances are achieved on both simulated and real benchmark data sets. An original regression data set involving the prediction of the number of visitors of the Orsay museum in Paris using bike-sharing system data is also introduced, illustrating the efficiency of the proposed approach. The R package spinyReg implementing the method proposed in this paper is available on CRAN.

Highlights

  • IntroductionParsimony has emerged as a very natural way to deal with highdimensional data spaces (Candes, 2014)

  • Over the past decades, parsimony has emerged as a very natural way to deal with highdimensional data spaces (Candes, 2014)

  • We considered the problem of Bayesian variable selection for high-dimensional linear regression through a sparse generative model

Read more

Summary

Introduction

Parsimony has emerged as a very natural way to deal with highdimensional data spaces (Candes, 2014). In the context of linear regression, finding a parsimonious parameter vector can both prevent overfitting, make an ill-posed problem (such as a “large p, small n” situation) tractable, and allow to interpret the data by finding which predictors are relevant. The problem of finding such predictors is referred to as sparse regression or variable selection and has mainly been considered either by likelihood penalization of the data, or by using Bayesian models

Penalized likelihood
Bayesian modelling
Our approach
Notation
The model
Posterior distribution
Links with spike-and-slab models
Inference strategy and relaxation
E-step
M-step
Links with automatic relevance determination
Model selection
Occam’s Razor
Prediction
Initialization
Computational cost
XZ2XT α
Path of Models
Simulation setup
An introductory example
Benchmark study on simulated data
Study on classical regression data sets
Predicting a touristic index using open data
The “OrsayVelib” database
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.