High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study

S Ejaz Ahmed,Hwanwoo Kim,Gökhan Yıldırım,Bahadır Yüzbaşı

doi:10.1007/978-3-030-17519-1_11

Abstract

Regression problems where the number of predictors, p, exceeds the number of responses, n, have become increasingly important in many diverse fields in the last couple of decades. In the classical case of “small p and large n,” the least squares estimator is a practical and effective tool for estimating the model parameters. However, in this so-called Big Data era, models have the characteristic that p is much larger than n. Statisticians have developed a number of regression techniques for dealing with such problems, such as the Lasso by Tibshirani (J R Stat Soc Ser B Stat Methodol 58:267–288, 1996), the SCAD by Fan and Li (J Am Stat Assoc 96(456):1348–1360, 2001), the LARS algorithm by Efron et al. (Ann Stat 32(2):407–499, 2004), the MCP estimator by Zhang (Ann Stat. 38:894–942, 2010), and a tuning-free regression algorithm by Chatterjee (High dimensional regression and matrix estimation without tuning parameters, 2015, https://arxiv.org/abs/1510.07294). In this paper, we investigate the relative performances of some of these methods for parameter estimation and variable selection through analyzing real and synthetic data sets. By an extensive Monte Carlo simulation study, we also compare the relative performance of proposed methods under correlated design matrix.

Full Text