Variable Selection with Second-Generation P-Values

Yi Zuo,Thomas G Stewart,Jeffrey D Blume

doi:10.1080/00031305.2021.1946150

Yi Zuo, Thomas G Stewart + Show 1 more

Open Access

https://doi.org/10.1080/00031305.2021.1946150

Copy DOI

Journal: The American Statistician	Publication Date: Jun 29, 2021
Citations: 5	License type: open-access

Affiliation: Vanderbilt University

Abstract

Many statistical methods have been proposed for variable selection in the past century, but few balance inference and prediction tasks well. Here we report on a novel variable selection approach called Penalized regression with Second-Generation P-Values (ProSGPV). It captures the true model at the best rate achieved by current standards, is easy to implement in practice, and often yields the smallest parameter estimation error. The idea is to use an penalization scheme with second-generation p-values (SGPV), instead of traditional ones, to determine which variables remain in a model. The approach yields tangible advantages for balancing support recovery, parameter estimation, and prediction tasks. The ProSGPV algorithm can maintain its good performance even when there is strong collinearity among features or when a high dimensional feature space with p > n is considered. We present extensive simulations and a real-world application comparing the ProSGPV approach with smoothly clipped absolute deviation (SCAD), adaptive lasso (AL), and minimax concave penalty with penalized linear unbiased selection (MC+). While the last three algorithms are among the current standards for variable selection, ProSGPV has superior inference performance and comparable prediction performance in certain scenarios. Supplementary materials are available online.

Full Text