Loss-guided stability selection

Tino Werner

doi:10.1007/s11634-023-00573-3

Abstract

AbstractIn modern data analysis, sparse model selection becomes inevitable once the number of predictor variables is very high. It is well-known that model selection procedures like the Lasso or Boosting tend to overfit on real data. The celebrated Stability Selection overcomes these weaknesses by aggregating models, based on subsamples of the training data, followed by choosing a stable predictor set which is usually much sparser than the predictor sets from the raw models. The standard Stability Selection is based on a global criterion, namely the per-family error rate, while additionally requiring expert knowledge to suitably configure the hyperparameters. Model selection depends on the loss function, i.e., predictor sets selected w.r.t. some particular loss function differ from those selected w.r.t. some other loss function. Therefore, we propose a Stability Selection variant which respects the chosen loss function via an additional validation step based on out-of-sample validation data, optionally enhanced with an exhaustive search strategy. Our Stability Selection variants are widely applicable and user-friendly. Moreover, our Stability Selection variants can avoid the issue of severe underfitting, which affects the original Stability Selection for noisy high-dimensional data, so our priority is not to avoid false positives at all costs but to result in a sparse stable model with which one can make predictions. Experiments where we consider both regression and binary classification with Boosting as model selection algorithm reveal a significant precision improvement compared to raw Boosting models while not suffering from any of the mentioned issues of the original Stability Selection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Advances in Data Analysis and Classification	Publication Date: Dec 15, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Loss-guided stability selection

Abstract

Talk to us

Similar Papers

More From: Advances in Data Analysis and Classification

Lead the way for us

Similar Papers

Author response: Rapid, Reference-Free human genotype imputation with denoising autoencoders
Raquel Dias ... Shang-Fu Chen
-
Raquel Dias, et. al.Raquel Dias ... Shang-Fu Chen
23 Feb 2022
23 Feb 2022

Variance-Component Based Sparse Signal Reconstruction and Model Selection
Kun Qiu ... Aleksandar Dogandzic
IEEE Transactions on Signal Processing | VOL. 58
Kun Qiu, et. al.Kun Qiu ... Aleksandar Dogandzic
01 Jun 2010
IEEE Transactions on Signal Processing | VOL. 58

Factor models and variable selection in high-dimensional regression analysis
Alois Kneip ... Pascal Sarda
The Annals of Statistics | VOL. 39
Alois Kneip, et. al.Alois Kneip ... Pascal Sarda
01 Oct 2011
The Annals of Statistics | VOL. 39

An R package AZIAD for analysing zero-inflated and zero-altered data
Niloufar Dousti Mousavi ... Jie Yang
Journal of Statistical Computation and Simulation | VOL. ahead-of-print
Niloufar Dousti Mousavi, et. al.Niloufar Dousti Mousavi ... Jie Yang
25 Apr 2021
Journal of Statistical Computation and Simulation | VOL. ahead-of-print

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Loss-guided stability selection

Abstract

Talk to us

Similar Papers

More From: Advances in Data Analysis and Classification