Abstract

ABSTRACT In quantile linear regression with ultrahigh-dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our proposed method can successfully select predictors when the variables are highly correlated, and it can also identify variables that make a contribution to the conditional quantiles but are marginally uncorrelated or weakly correlated with the response. Theoretical results show that the proposed algorithm can yield the sure screening set. By controlling the false selection rate, model selection consistency can be achieved theoretically. In practice, we proposed using EBIC for best subset selection so that the resulting model is screening consistent. Simulation studies demonstrate that the proposed algorithm performs well, and an empirical example is presented. Supplementary materials for this article are available online.

Highlights

  • Advances in modern technology have enabled people to collect massive data with a large number of variables, many of which may be irrelevant to the response variable

  • Before we present the quantile partial correlation (QPCOR), we review the quantile correlation (QCOR) and its connection to regression coefficients in the linear quantile regression model

  • In sparse ultra-high dimensional quantile regression, we introduce three algorithms, QPCS, QTCS, and QFR, that use quantile correlation and quantile partial correlation to screen explanatory variables

Read more

Summary

Introduction

Advances in modern technology have enabled people to collect massive data with a large number of variables, many of which may be irrelevant to the response variable. We adopt Li et al (2015)’s quantile partial correlation (QPCOR) as a criterion to measure the association of each predictor with the response at each quantile, and introduce a new screening procedure by using the sample QPCOR. For mean regression models with ultra-high dimensional covariates, Fan and Lv (2008) proposed the SIS procedure to select variables according to the magnitudes of their marginal Pearson correlations associated with the response. This marginal approach ignores possible effects from other variables and may yield misleading results when the predictors are correlated To illustrate this phenomenon, we first introduce the quantile multiple regression model and its associated estimators given below. We study the asymptotic property of the sample estimate of QPCOR and the screening property of the selected variables via this estimate

Theoretical properties
Screening algorithms
Simulation studies
Application
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call