Abstract

We consider high-dimensional inference when the assumed linear model is misspecified. We describe some correct interpretations and corresponding sufficient assumptions for valid asymptotic inference of the model parameters, which still have a useful meaning when the model is misspecified. We largely focus on the de-sparsified Lasso procedure but we also indicate some implications for (multiple) sample splitting techniques. In view of available methods and software, our results contribute to robustness considerations with respect to model misspecification.

Highlights

  • The construction of confidence intervals and statistical hypothesis tests is a primary goal for assessing uncertainty in high-dimensional inference

  • The current work offers a precise description of interpretation and assumptions for inference in a misspecified high-dimensional linear model

  • A modification of the variance as in (7) is needed for the case of a random design misspecified model. Such a modification seems always advisable for the random design case, as it is consistent irrespective whether the model is correct or not and offers some robustness against model misspecification; see Section 3.1.1

Read more

Summary

Introduction

The construction of confidence intervals and statistical hypothesis tests is a primary goal for assessing uncertainty in high-dimensional inference. The novelty of this work is that we explicitly discuss the implications of linear model misspecification for construction of confidence intervals and hypothesis testing in high dimensions We believe that this is a missing piece which should be addressed and which is informally often treated according to the folklore that the procedure leads to inference for the “best projected regression parameters”: we make this precise and show that some modifications are necessary for the random design case (see above). The latter are implemented in the statistical R-software package hdi [21] which includes various methods for frequentist highdimensional inference [10]

The de-sparsified Lasso for potentially misspecified linear models
Results when the model is correctly specified
Random design model
Estimation of the variance
Practical recommendation
Gaussian design
Fixed design model
Some empirical results
Simulations for random design
Simulations for fixed design
Discussion
Sample splitting methods
Preliminary results
Proof of Proposition 1
Proof of Proposition 3
Proof of Proposition 4
Proof of Proposition 5
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call