Abstract
Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its (apparent) universality. Many results exist on model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.
Highlights
Likelihood maximization, least squares and empirical contrast minimization require to choose some model, that is, a set from which an estimator will be returned
Model selection can be seen as a particular algorithm selection problem
V -fold penalization satisfies an oracle inequality with Cn → 1 as n → +∞, both when V = O(1) (Arlot, 2008b) and when V = n (Arlot, 2009)
Summary
Likelihood maximization, least squares and empirical contrast minimization require to choose some model, that is, a set from which an estimator will be returned. Let us call statistical algorithm any function that returns an estimator from data—for instance, likelihood maximization on some given model. Some CV procedures have been proved to fail for some model selection problems, depending on the goal of model selection, estimation or identification (see Section 2). Which CV procedure should be used for a given model selection problem?. A brief overview of some model selection procedures is given in Section 3; these are important for better understanding CV. The general performances of CV for model selection are described, when the goal is either estimation (Section 6) or identification (Section 7).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have