ModelSampler: An R Tool for Variable Selection and Model Exploration in Linear Regression

Tanujit Dey

doi:10.6339/jds.2013.11(2).1133

Abstract

We have developed a tool for model space exploration and variable selec tion in linear regression models based on a simple spike and slab model (Dey, 2012). The model chosen is the best model with minimum final prediction error (FPE) values among all other models. This is implemented via the R package modelSampler. However, model selection based on FPE criteria is dubious and question able as FPE criteria can be sensitive to perturbations in the data. This R package can be used for empirical assessment of the stability of FPE criteria. A stable model selection is accomplished by using a bootstrap wrapper that calls the primary function of the package several times on the bootstrapped data. The heart of the method is the notion of model averaging for sta ble variable selection and to study the behavior of variables over the entire model space, a concept invaluable in high dimensional situations.

Highlights

Variable selection in linear regression models is an important aspect of many scientific analyses
For comparison purposes we have considered three different methods: Random Forest, Boosting and Bayesian Model Averaging (BMA) methods; the first two methods are frequentist methods while Bayesian model averaged (BMA) is based on Bayesian methodology
Note that out of these four methods only rescaled spike and slab (RSS) and BMA does variable selection, so OOB prediction error (PE) computations are always based on a subset of variables, whereas the Random Forest (RF) and Boosting methods use all variables for PE computation

Summary

Introduction

Variable selection in linear regression models is an important aspect of many scientific analyses. Note that unlike traditional BMA where the goal is prediction (Hoeting et al, 1999), our ensemble is derived solely for purposes of variable selection This type of analysis is very different from the linear regression model implementation via bicreg function of the R package BMA (Rafttery et al, 2010) for Bayesian model averaging. The unique feature of the bimodal prior in the RSS model (details discussed later) is that it creates a unique mapping between posterior sample and a visited model (for details see the Gibbs sampler in the Appendix) This helps to perform FPE based variable selection. This R package produces high dimensional graphics to visualize several salient features related to variable selection procedure, such as importance of variables with respect to total number of variables in the data set, visualizing the entire model space, the instability of FPE criteria, prediction error plot, etc

Organization of the Article

A Bimodal Spike and Slab Model

Variable Selection Based on modelSampler

Optimal Model Size Determination via Hard Shrinkage and Model Averaging

? Summary

Example

Convergence of modelSampler

Diabetes Data

Icicle Plot

Out-of-bagging and the Best Subset of Variables

Empirical Study

Simulation Study

Real Data Application

Variable Stability and Model Space Revisited

Another Example of Variable Stability Plot

Discussion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Data Science	Publication Date: Jul 30, 2021
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

ModelSampler: An R Tool for Variable Selection and Model Exploration in Linear Regression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Science

Lead the way for us

Similar Papers

On the choice of penalty term in generalized FPE criterion
Ping Zhang
-
Ping ZhangPing Zhang
01 Jan 1993
01 Jan 1993

Appropriate penalties in the final prediction error criterion: a decision theoretic approach
Ping Zhang ... Abba M Krieger
Statistics & Probability Letters | VOL. 18
Ping Zhang, et. al.Ping Zhang ... Abba M Krieger
01 Oct 1993
Statistics & Probability Letters | VOL. 18

Research on sensing information forecasting for Power Assist Walking Legs
Zhaojun Sun ... Yunjian Ge
-
Zhaojun Sun, et. al. Zhaojun Sun ... Yunjian Ge
01 Aug 2009
01 Aug 2009

Some relations between the various criteria for autoregressive model order determination
D Burshtein ... E Weinstein
IEEE Transactions on Signal Processing | VOL. 33
D Burshtein, et. al.D Burshtein ... E Weinstein
01 Aug 1985
IEEE Transactions on Signal Processing | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ModelSampler: An R Tool for Variable Selection and Model Exploration in Linear Regression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Science