Cross-validation (CV), while being extensively used for model selection, may have three major weaknesses. The regular 10-fold CV, for instance, is often unstable in its choice of the best model among the candidates. Secondly, the CV outcome of singling out one candidate based on the total prediction errors over the different folds does not convey any sensible information on how much one can trust the apparent winner. Lastly, when only one data splitting ratio is considered, regardless of its choice, it may work very poorly for some situations. In this work, to address these shortcomings, we propose a new averaging-voting based version of cross-validation for better comparison results. Simulations and real data are used to illustrate the superiority of the new approach over traditional CV methods.
Read full abstract