Abstract

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its (apparent) universality. Many results exist on model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.

Highlights

  • Likelihood maximization, least squares and empirical contrast minimization require to choose some model, that is, a set from which an estimator will be returned

  • Model selection can be seen as a particular algorithm selection problem

  • V -fold penalization satisfies an oracle inequality with Cn → 1 as n → +∞, both when V = O(1) (Arlot, 2008b) and when V = n (Arlot, 2009)

Read more

Summary

Introduction

Likelihood maximization, least squares and empirical contrast minimization require to choose some model, that is, a set from which an estimator will be returned. Let us call statistical algorithm any function that returns an estimator from data—for instance, likelihood maximization on some given model. Some CV procedures have been proved to fail for some model selection problems, depending on the goal of model selection, estimation or identification (see Section 2). Which CV procedure should be used for a given model selection problem?. A brief overview of some model selection procedures is given in Section 3; these are important for better understanding CV. The general performances of CV for model selection are described, when the goal is either estimation (Section 6) or identification (Section 7).

Statistical framework
Statistical problems
Statistical algorithms and estimators
Model selection
The model selection paradigm
Model selection for estimation
Model selection for identification
Overview of some model selection procedures
Estimation
Other approaches
Where are cross-validation procedures in this picture?
Cross-validation procedures
Cross-validation philosophy
Hold-out
General definition of cross-validation
Classical examples
Exhaustive data splitting
Partial data splitting
Other cross-validation-like risk estimators
Historical remarks
Statistical properties of cross-validation estimators of the risk
Theoretical assessment of bias
Bias correction
Variability factors
Theoretical assessment of variance
Variance estimation
Risk estimation and model selection
The big picture
Results in various frameworks
General conditions towards model consistency
Refined analysis for the algorithm selection problem
Time series and dependent observations
Large number of models
Robustness to outliers
Density estimation
Closed-form formulas and fast computation
10.1. The big picture
10.2. How should the splits be chosen?
10.3. V-fold cross-validation
10.4. Cross-validation or penalized criteria?
10.5. Future research
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call