Abstract
We address the problem of Bayesian variable selection for high-dimensional linear regression. We consider a generative model that uses a spike-and-slab-like prior distribution obtained by multiplying a deterministic binary vector, which traduces the sparsity of the problem, with a random Gaussian parameter vector. The originality of the work is to consider inference through relaxing the model and using a type-II log-likelihood maximization based on an EM algorithm. Model selection is performed afterwards relying on Occam’s razor and on a path of models found by the EM algorithm. Numerical comparisons between our method, called spinyReg, and state-of-the-art high-dimensional variable selection algorithms (such as lasso, adaptive lasso, stability selection or spike-and-slab procedures) are reported. Competitive variable selection results and predictive performances are achieved on both simulated and real benchmark data sets. An original regression data set involving the prediction of the number of visitors of the Orsay museum in Paris using bike-sharing system data is also introduced, illustrating the efficiency of the proposed approach. The R package spinyReg implementing the method proposed in this paper is available on CRAN.
Highlights
IntroductionParsimony has emerged as a very natural way to deal with highdimensional data spaces (Candes, 2014)
Over the past decades, parsimony has emerged as a very natural way to deal with highdimensional data spaces (Candes, 2014)
We considered the problem of Bayesian variable selection for high-dimensional linear regression through a sparse generative model
Summary
Parsimony has emerged as a very natural way to deal with highdimensional data spaces (Candes, 2014). In the context of linear regression, finding a parsimonious parameter vector can both prevent overfitting, make an ill-posed problem (such as a “large p, small n” situation) tractable, and allow to interpret the data by finding which predictors are relevant. The problem of finding such predictors is referred to as sparse regression or variable selection and has mainly been considered either by likelihood penalization of the data, or by using Bayesian models
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.