Abstract

Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper, we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2300 data-generating scenarios, including both synthetic and semisynthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely used approaches (Lasso, Adaptive Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector and Stability Selection). We find considerable variation in performance between methods. Our results support a “no panacea” view, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods. The study allows us to make some recommendations as to which approaches may be most (or least) suitable given the goal and some data characteristics. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.

Highlights

  • In a wide range of applications, it is routine to encounter regression problems where the number of features or covariates p exceeds the sample size n, often greatly

  • When one is confident of being in an “easy” scenario with sufficiently large r and signal-to-noise ratio (SNR), SCAD could be considered here as it may perform notably better than Lasso and AdaLasso, but using SCAD carries more risk due to the high variability arising from its transition behavior

  • An L2 penalty and AdaLasso provide no substantive benefit over Lasso An L2 penalty offers very little benefit for prediction, with Ridge performing substantially worse than all the other methods in many scenarios of moderate-to-large SNR

Read more

Summary

Introduction

In a wide range of applications, it is routine to encounter regression problems where the number of features or covariates p exceeds the sample size n, often greatly. Even in the simple case of linear models with independent Gaussian noise, estimation is nontrivial and requires specific assumptions. A common and often appropriate assumption is that of sparsity, where only a subset of the variables (the active set).

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call