An Ensemble EM Algorithm for Bayesian Variable Selection

Jin Wang,Yuan Ji,Feng Liang,Yunbo Ouyang

doi:10.1214/21-ba1275

Abstract

We study the Bayesian approach to variable selection for linear regression models. Motivated by a recent work by Ročková and George (2014), we propose an EM algorithm that returns the MAP estimator of the set of relevant variables. Due to its particular updating scheme, our algorithm can be implemented efficiently without inverting a large matrix in each iteration and therefore can scale up with big data. We also have showed that the MAP estimator returned by our EM algorithm achieves variable selection consistency even when p diverges with n. In practice, our algorithm could get stuck with local modes, a common problem with EM algorithms. To address this issue, we propose an ensemble EM algorithm, in which we repeatedly apply our EM algorithm to a subset of the samples with a subset of the covariates, and then aggregate the variable selection results across those bootstrap replicates. Empirical studies have demonstrated the superior performance of the ensemble EM algorithm.

Highlights

Consider a simple linear regression model with Gaussian noise: y = Xβ + e, (1.1)where y = (y1, . . . , yn)T is the n × 1 response vector, X is the n × p design matrix, β = (β1, . . . , βp)T is the unknown regression coefficient vector, and e = (e1, . . . , en)T is a vector of i.i.d
Borrowing the idea of bagging, we propose an ensemble version of our EM algorithm: apply our EM algorithm to multiple Bayesian bootstrap (BB) copies of the data, and aggregate the variable selection results
Variable selection is an important problem in modern statistics

Summary

Introduction

Consider a simple linear regression model with Gaussian noise:. where y = (y1, . . . , yn)T is the n × 1 response vector, X is the n × p design matrix, β = (β1, . . . , βp)T is the unknown regression coefficient vector, and e = (e1, . . . , en)T is a vector of i.i.d. Rockovaand George (2014) proposed a simple, elegant EM algorithm for Bayesian variable selection. They adopted a continuous version of the “spike and slab”. Prior in which the spike and the slab components in (1.2) are two normal distributions with different variances (George and McCulloch, 1993), and proposed an EM algorithm to obtain the MAP estimator of the regression coefficients β. We adopt the same continuous “spike and slab” prior as do Rockovaand George (2014), but while their algorithm returns βMAP by treating γ as latent, our approach treats β as latent and returns γMAP, the MAP estimator of the model index.

Prior Specification

The EM Algorithm

Computational Cost

Asymptotic Consistency

Bayesian Bootstrap

Empirical Study

Performance on a Widely Used Benchmark

Performance on a Highly-Correlated Data Set

Performance on a Large-p Small-n Example

A Real Example

Further Discussion

Proofs

Selection Consistency when p n

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Ensemble EM Algorithm for Bayesian Variable Selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bayesian Analysis

Lead the way for us

Journal: Bayesian Analysis	Publication Date: Sep 1, 2022
License type: cc-by

Similar Papers

Genetic algorithms for outlier detection and variable selection in linear regression models
J Tolvi
Soft Computing | VOL. 8
J TolviJ Tolvi
07 Oct 2003
Soft Computing | VOL. 8

PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection
Chun-Xia Zhang ... Jiang-She Zhang
Computational Statistics | VOL. 31
Chun-Xia Zhang, et. al.Chun-Xia Zhang ... Jiang-She Zhang
16 Mar 2016
Computational Statistics | VOL. 31

Theory and Practice of Expectation Maximization (EM) Algorithm
Chandan K Reddy ... Bala Rajaratnam
-
Chandan K Reddy, et. al.Chandan K Reddy ... Bala Rajaratnam
01 Jan 2009
01 Jan 2009

Applications of L1 regularisation
M R Osborne ... Tania Prvan
ANZIAM Journal | VOL. 52
M R Osborne, et. al.M R Osborne ... Tania Prvan
17 Oct 2011
ANZIAM Journal | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Ensemble EM Algorithm for Bayesian Variable Selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bayesian Analysis