A survey of cross-validation procedures for model selection

Sylvain Arlot,Alain Celisse

doi:10.1214/09-ss054

Abstract

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its (apparent) universality. Many results exist on model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.

Highlights

Likelihood maximization, least squares and empirical contrast minimization require to choose some model, that is, a set from which an estimator will be returned
Model selection can be seen as a particular algorithm selection problem
V -fold penalization satisfies an oracle inequality with Cn → 1 as n → +∞, both when V = O(1) (Arlot, 2008b) and when V = n (Arlot, 2009)

Summary

Introduction

Likelihood maximization, least squares and empirical contrast minimization require to choose some model, that is, a set from which an estimator will be returned. Let us call statistical algorithm any function that returns an estimator from data—for instance, likelihood maximization on some given model. Some CV procedures have been proved to fail for some model selection problems, depending on the goal of model selection, estimation or identification (see Section 2). Which CV procedure should be used for a given model selection problem?. A brief overview of some model selection procedures is given in Section 3; these are important for better understanding CV. The general performances of CV for model selection are described, when the goal is either estimation (Section 6) or identification (Section 7).

Statistical framework

Statistical problems

Statistical algorithms and estimators

Model selection

The model selection paradigm

Model selection for estimation

Model selection for identification

Overview of some model selection procedures

Estimation

Other approaches

Where are cross-validation procedures in this picture?

Cross-validation procedures

Cross-validation philosophy

Hold-out

General definition of cross-validation

Classical examples

Exhaustive data splitting

Partial data splitting

Other cross-validation-like risk estimators

Historical remarks

Statistical properties of cross-validation estimators of the risk

Theoretical assessment of bias

Bias correction

Variability factors

Theoretical assessment of variance

Variance estimation

Risk estimation and model selection

The big picture

Results in various frameworks

General conditions towards model consistency

Refined analysis for the algorithm selection problem

Time series and dependent observations

Large number of models

Robustness to outliers

Density estimation

Closed-form formulas and fast computation

10.1. The big picture

10.2. How should the splits be chosen?

10.3. V-fold cross-validation

10.4. Cross-validation or penalized criteria?

10.5. Future research

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Statistics surveys	Publication Date: Jul 27, 2009
Citations: 3208	License type: cc-by

R Discovery Prime

R Discovery Prime

A survey of cross-validation procedures for model selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Statistics surveys

Lead the way for us

Similar Papers

Model Uncertainty and Selection of Risk Models for Left-Truncated and Right-Censored Loss Data
Qian Zhao ... Sahadeb Upretee
Risks | VOL. 11
Qian Zhao, et. al.Qian Zhao ... Sahadeb Upretee
30 Oct 2023
Risks | VOL. 11

Database Native Model Selection: Harnessing Deep Neural Networks in Database Systems
Naili Xing ... Shaofeng Cai
Proceedings of the VLDB Endowment | VOL. 17
Naili Xing, et. al.Naili Xing ... Shaofeng Cai
01 Jan 2024
Proceedings of the VLDB Endowment | VOL. 17

High Phonon Frequency Approximation in the Calculation of Phonon Drag Magnetothermopower11The project supported in part by National Natural Science Foundation of China
G Qin ... T.M Fromhold
Communications in Theoretical Physics | VOL. 19
G Qin, et. al.G Qin ... T.M Fromhold
01 Jan 1992
Communications in Theoretical Physics | VOL. 19

Optimal cross-validation in density estimation with the $L^{2}$-loss
Alain Celisse
The Annals of Statistics | VOL. 42
Alain CelisseAlain Celisse
01 Oct 2014
The Annals of Statistics | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A survey of cross-validation procedures for model selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Statistics surveys