Abstract

In many applications with high dimensional covariates, the covariates are naturally structured into different groups which can be used to perform efficient statistical inference. We propose a Bayesian hierarchical model with a spike and slab prior specification to perform group selection in high dimensional linear regression models. While several penalization methods and more recently, some Bayesian approaches are proposed for group selection, theoretical properties of Bayesian approaches have not been studied extensively. In this paper, we provide novel theoretical results for group selection consistency under spike and slab priors which demonstrate that the proposed Bayesian approach has advantages compared to penalization approaches. Our theoretical results accommodate flexible conditions on the design matrix and can be applied to commonly used statistical models such as nonparametric additive models for which very limited theoretical results are available for the Bayesian methods. A shotgun stochastic search algorithm is adopted for the implementation of our proposed approach. We illustrate through simulation studies that the proposed method has better performance for group selection compared to a variety of existing methods.

Highlights

  • Variable selection is a crucial statistical tool especially in high dimensional data settings as it provides interpretability of the learned model and often helps to improve prediction power by removing irrelevant predictors

  • We first introduce the following notations to be used in our theoretical results: Operations of model indices: For a regression model indexed by k, we use kc = 1G − k as the index of its complementary model

  • To test the performance of our method, we compare it with existing methods including adaptive group least absolute shrinkage and selection operator (Lasso), group Lasso, group smoothly clipped absolute deviation (SCAD), group minimax concave penalty (MCP), and BGL-SS under different settings

Read more

Summary

Introduction

Variable selection is a crucial statistical tool especially in high dimensional data settings as it provides interpretability of the learned model and often helps to improve prediction power by removing irrelevant predictors. The group Lasso estimator (Yuan and Lin, 2006) was proposed to perform group selection and is defined as the minimizer of the following objective function: This is a natural extension of the Lasso (Tibshirani, 1996) by applying the L1 penalty to the L2 norms of the group coefficients. We propose spike and slab priors to perform model selection at the group level in the Bayesian framework. Following the same idea of Narisetty and He (2014), we suggest that the slab prior should be sample size dependent to achieve appropriate shrinkage With this specification, our proposed method is shown to have strong selection consistency under more general designs.

Bayesian Group Selection
Posterior Distribution of Z
Theoretical Results
Applications to Specific Statistical Models
Nonparametric Additive Models
Seemingly Unrelated Regressions
Computation
Shotgun Stochastic Search Algorithm
Gibbs Sampling
Simulation Results
Application to Gene Expression Data
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call