Abstract

This paper presents a method for fast Bayesian variable selection in the normal linear regression model with high dimensional data. A novel approach is adopted in which an explicit posterior probability for including a covariate is obtained. The method is sequential but not order dependent, one deals with each covariate one by one, and a spike and slab prior is only assigned to the coefficient under investigation. We adopt the well-known spike and slab Gaussian priors with a sample size dependent variance, which achieves strong selection consistency for marginal posterior probabilities even when the number of covariates grows almost exponentially with sample size. Numerical illustrations are presented where it is shown that the new approach provides essentially equivalent results to the standard spike and slab priors, i.e. the same marginal posterior probabilities of the coefficients being nonzero, which are estimated via Gibbs sampling. Hence, we obtain the same results via the direct calculation of $p$ probabilities, compared to a stochastic search over a space of $2^{p}$ elements. Our procedure only requires $p$ probabilities to be calculated, which can be done exactly, hence parallel computation when $p$ is large is feasible.

Highlights

  • Variable selection for the linear model is currently a topic of immense interest

  • In genetic studies, where the response variable corresponds to a particular observable trait, the number of subjects n may be of order 103, while the number of genetic features p can be of order 105

  • We illustrate that our method provides results that are essentially equivalent to Bayesian shrinking and diffusing (BASAD) in two ways: numerically through simulation studies, and theoretically through proving the same strong selection consistency with similar conditions

Read more

Summary

Introduction

Variable selection for the linear model is currently a topic of immense interest. In this paper, we consider the Gaussian linear regression model under high dimensional setting. In genetic studies, where the response variable corresponds to a particular observable trait, the number of subjects n may be of order 103, while the number of genetic features p can be of order 105. The sparsity assumption says that even if the total number of covariates p may grow with n, |S∗| is fixed. Zeros may be added to the coefficient vector β as n increases, but no nonzero components. We do not need the index in the set S∗ to be fixed, as long as |S∗| is fixed This is a typical assumption for the variable selection literature for diverging number of covariates regarding selection consistency, such as in [15] and [18]

Objectives
Methods
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.