Abstract

We propose two multivariate extensions of the Bayesian group lasso for variable selection and estimation for data with high dimensional predictors and multi-dimensional response variables. The methods utilize spike and slab priors to yield solutions which are sparse at either a group level or both a group and individual feature level. The incorporation of group structure in a predictor matrix is a key factor in obtaining better estimators and identifying associations between multiple responses and predictors. The approach is suited to many biological studies where the response is multivariate and each predictor is embedded in some biological grouping structure such as gene pathways. Our Bayesian models are connected with penalized regression, and we prove both oracle and asymptotic distribution properties under an orthogonal design. We derive efficient Gibbs sampling algorithms for our models and provide the implementation in a comprehensive R package called MBSGS available on the Comprehensive R Archive Network (CRAN). The performance of the proposed approaches is compared to state-of-the-art variable selection strategies on simulated data sets. The proposed methodology is illustrated on a genetic dataset in order to identify markers grouping across chromosomes that explain the joint variability of gene expression in multiple tissues.

Highlights

  • In this article, we consider the challenging task of developing a fully Bayesian sparse regression analysis for the situation when the numbers of predictors is larger than observations for a multivariate response and covariates grouped by blocks with the sparsity for blocks and within blocks

  • A second simulation study was performed with a multivariate response to demonstrate the good prediction and variable selection accuracy performance of MBGL-SS and MBSGS-SS when compared with BGL-SS, BSGS-SS (Bayesian Sparse Group selection with spike and slab priors defined in Xu and Ghosh (2015)) and two lasso methods

  • We have proposed Bayesian methods for group-sparse modeling in the context of a multivariate correlated response variable

Read more

Summary

Introduction

We consider the challenging task of developing a fully Bayesian sparse regression analysis for the situation when the numbers of predictors is larger than observations for a multivariate response and covariates grouped by blocks with the sparsity for blocks and within blocks. A frequentist way to tackle this problem is offered by Friedman et al (2010) who suggest using a group lasso penalty to select variables which are related to all components of the response Y Extensions of this approach have been proposed to provide simultaneous estimation of the precision matrix (inverse of Σ) and of the regression coefficients (see e.g., Rothman et al (2010), Lee and Liu (2012), Cai et al (2013)). We exploit and extend their approach for multivariate responses and propose the following hierarchical multivariate Bayesian group lasso model with an independent spike and slab prior for each group variable Bg: Y|X, B, Σ ∼ M Nn×q(XB, Σ, In), V ec(BTg |Σ, τg, π0) i∼nd (1 − π0)Nmgq(0, Img ⊗ τg2Σ) + π0δ0(V ec(BTg )), τg i∼nd Gamma mgq + 1 , λ2g 22.

Connection to penalized regression and alternate reformulation of the model
Median thresholding estimator
Gibbs sampler
Simulation studies
Univariate setting
Multivariate setting
Application to real data
Summary statistics
Concluding remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call