Abstract
SummaryThe paper considers the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such as K-fold cross-validation suffer from large biases. Motivated by the low bias of the leave-one-out cross-validation method, we propose a computationally efficient closed form approximate leave-one-out formula ALO for a large class of regularized estimators. Given the regularized estimate, calculating ALO requires a minor computational overhead. With minor assumptions about the data-generating process, we obtain a finite sample upper bound for the difference between leave-one-out cross-validation and approximate leave-one-out cross-validation, |LO−ALO|. Our theoretical analysis illustrates that |LO−ALO|→0 with overwhelming probability, when n, p → ∞, where the dimension p of the feature vectors may be comparable with or even greater than the number of observations, n. Despite the high dimensionality of the problem, our theoretical results do not require any sparsity assumption on the vector of regression coefficients. Our extensive numerical experiments show that |LO−ALO| decreases as n and p increase, revealing the excellent finite sample performance of approximate leave-one-out cross-validation. We further illustrate the usefulness of our proposed out-of-sample risk estimation method by an example of real recordings from spatially sensitive neurons (grid cells) in the medial entorhinal cortex of a rat.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have