An Evaluation of Four Resampling Methods Used in Machine Learning Classification

Robbie T Nakatsu

doi:10.1109/mis.2020.2978066

Abstract

This article investigates resampling methods used to evaluate the performance of machine learning classification algorithms. It compares four key resampling methods: 1) Monte Carlo resampling, 2) the Bootstrap Method, 3) k-fold Cross Validation, and 4) Repeated k-fold Cross Validation. Two classification algorithms, Support Vector Machines and Random Forests, applied to three datasets, are used in this article. Nine variations of the four resampling methods are used to tune parameters on the two classification algorithms on each of the three datasets. Performance is defined by how well the resampling method chooses a parameter value that fits the data well. A main finding is that Repeated k-fold Cross Validation, overall, outperforms the other resampling methods in selecting the best-fit parameter value across the three different datasets.

Full Text