Uniform post selection inference for LAD regression models

Victor Chernozhukov,Alexandre Belloni,Kengo Kato

doi:10.1920/wp.cem.2013.2413

Abstract

We develop uniformly valid confidence regions for a regression coefficient in a high-dimensional sparse LAD (least absolute deviation or median) regression model. The setting is one where the number of regressors p could be large in comparison to the sample size n, but only s « n of them are needed to accurately describe the regression function. Our new methods are based on the instrumental LAD regression estimator that assembles the optimal estimating equation from either post l- penalised LAD regression or l1- penalised LAD regression. The estimating equation is immunised against non-regular estimation of nuisance part of the regression function, in the sense of Neyman. We establish that in a homoscedastic regression model, under certain conditions, the instrumental LAD regression estimator of the regression coefficient is asymptotically root-n normal uniformly with respect to the underlying sparse model. The resulting confidence regions are valid uniformly with respect to the underlying model. The new inference methods outperform the naive, 'oracle based' inference methods, which are known to be not uniformly valid- with coverage property failing to hold uniformly with respect the underlying model- even in the setting with p = 2. We also provide Monte-Carlo experiments which demonstrate that standard post-selection inference breaks down over large parts of the parameter space, and the proposed method does not.

Full Text