A simple new approach to variable selection in regression, with application to genetic fine mapping.

Gao Wang,Peter Carbonetto,Matthew Stephens,Abhishek Sarkar

doi:10.1111/rssb.12388

Abstract

We introduce a simple new approach to variable selection in linear regression, with a particular focus on quantifying uncertainty in which variables should be selected. The approach is based on a new model - the "Sum of Single Effects" (SuSiE) model - which comes from writing the sparse vector of regression coefficients as a sum of "single-effect" vectors, each with one non-zero element. We also introduce a corresponding new fitting procedure - Iterative Bayesian Stepwise Selection (IBSS) - which is a Bayesian analogue of stepwise selection methods. IBSS shares the computational simplicity and speed of traditional stepwise methods, but instead of selecting a single variable at each step, IBSS computes a distribution on variables that captures uncertainty in which variable to select. We provide a formal justification of this intuitive algorithm by showing that it optimizes a variational approximation to the posterior distribution under the SuSiE model. Further, this approximate posterior distribution naturally yields convenient novel summaries of uncertainty in variable selection, providing a Credible Set of variables for each selection. Our methods are particularly well-suited to settings where variables are highly correlated and detectable effects are sparse, both of which are characteristics of genetic fine-mapping applications. We demonstrate through numerical experiments that our methods outperform existing methods for this task, and illustrate their application to fine-mapping genetic variants influencing alternative splicing in human cell-lines. We also discuss the potential and challenges for applying these methods to generic variable selection problems.

Highlights

The need to identify, or “select”, relevant variables in regression models arises in a diverse range of applications, and has spurred development of a correspondingly diverse range of methods
We provide a principled justification for this intuitive algorithm by showing that it optimizes a variational approximation to the posterior distribution under the Sum of Single Effects” (SuSiE) model
Some non-trivial differences in posterior inclusion probability (PIP) are clearly visible from Figure 2A. Visual inspection of these differences suggests that the SuSiE PIPs may better distinguish effect variables from non-effect variables, in that there appears a higher ratio of red-gray points below the diagonal than above the diagonal

Summary

INTRODUCTION

The need to identify, or “select”, relevant variables in regression models arises in a diverse range of applications, and has spurred development of a correspondingly diverse range of methods (e.g., see O’Hara and Sillanpaa ̈ , 2009; Fan and Lv, 2010; Desboulets, 2018; George and McCulloch, 1997, for reviews). This requires methods that can draw conclusions such as “either x1 or x2 is relevant and we cannot decide which” rather than methods that arbitrarily select one of the variables and ignore the other While this may seem a simple goal, in practice most existing variable selection methods do not satisfactorily address this problem (see Section 2 for further discussion). A key feature of our method, which distinguishes it from most existing BVSR methods, is that it produces “Credible Sets” of variables that quantify uncertainty in which variable should be selected when multiple, highly correlated variables compete with one another These Credible Sets are designed to be as small as possible while still each capturing a relevant variable. We end with a discussion highlighting avenues for further work

A motivating toy example

Credible Sets

The single effect regression model

Posterior under SER model

Empirical Bayes for SER model

THE SUM OF SINGLE EFFECTS REGRESSION MODEL

Fitting SuSiE

IBSS computes a variational approximation to the SuSiE posterior distribution

Contrast to previous variational approximations

Posterior inference: posterior inclusion probabilities and Credible Sets

Choice of L

Identifiability and label-switching

NUMERICAL COMPARISONS

Illustrative example

Posterior inclusion probabilities

15 IBSS after 10 iterations

APPLICATION TO FINE-MAPPING SPLICING QTLS

AN EXAMPLE BEYOND FINE-MAPPING

DISCUSSION

DATA AND RESOURCES

Bayesian simple linear regression

Computing Credible Sets

Estimating hyperparameters

Empirical Bayes as a single optimization problem

Variational approximation

The additive effects model

Special case of SuSiE model

Proof of Corollary 1

Proof of Proposition 2

Computing the evidence lower bound

C CONNECTING SUSIE TO STANDARD BVSR

Simulated data

Software and hardware specifications for numerical comparisons study

Findings

E FUNCTIONAL ENRICHMENT OF SPLICE QTL FINE MAPPING

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of the Royal Statistical Society Series B: Statistical Methodology	Publication Date: Jul 10, 2020
Citations: 521	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A simple new approach to variable selection in regression, with application to genetic fine mapping.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the Royal Statistical Society Series B: Statistical Methodology

Lead the way for us

Similar Papers

Comparison of Bayesian objective procedures for variable selection in linear regression
Elías Moreno ... F Javier Girón
TEST | VOL. 17
Elías Moreno, et. al.Elías Moreno ... F Javier Girón
05 Mar 2008
TEST | VOL. 17

Bayesian Variable Selection in Regression with Networked Predictors
Feng Tai ... Wei Pan
-
Feng Tai, et. al.Feng Tai ... Wei Pan
01 Dec 2010
01 Dec 2010

Bayes factor asymptotics for variable selection in the Gaussian process framework
Minerva Mukhopadhyay ... Sourabh Bhattacharya
Annals of the Institute of Statistical Mathematics | VOL. 74
Minerva Mukhopadhyay, et. al.Minerva Mukhopadhyay ... Sourabh Bhattacharya
20 Sep 2021
Annals of the Institute of Statistical Mathematics | VOL. 74

Bayesian variable selection strategies in longitudinal mixture models and categorical regression problems.
Md Nazir Uddin
-
Md Nazir UddinMd Nazir Uddin
04 Oct 2022
04 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A simple new approach to variable selection in regression, with application to genetic fine mapping.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the Royal Statistical Society Series B: Statistical Methodology