K‐fold cross‐validation for complex sample surveys

Jerzy Wieczorek,Thomas Mcmahon,Cole Guerin

doi:10.1002/sta4.454

K‐fold cross‐validation for complex sample surveys

Jerzy Wieczorek, Thomas Mcmahon + Show 1 more

Open Access

https://doi.org/10.1002/sta4.454

Copy DOI

Journal: Stat (International Statistical Institute)	Publication Date: May 3, 2022
Citations: 18	License type: CC BY 4.0

Affiliation: Colby College

#Unequal Selection Probabilities #Mathematical Arguments + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Although K‐fold cross‐validation (CV) is widely used for model evaluation and selection, there has been limited understanding of how to perform CV for non‐iid data, including those from sampling designs with unequal selection probabilities. We introduce CV methodology that is appropriate for design‐based inference from complex survey sampling designs. For such data, we claim that we will tend to make better inferences when we choose the folds and compute the test errors in ways that account for the survey design features such as stratification and clustering. Our mathematical arguments are supported with simulations, and our methods are illustrated on real survey data.

Full Text