Reproducible Hyperparameter Optimization

Lars Hertel,Pierre Baldi,Daniel L Gillen

doi:10.1080/10618600.2021.1950004

Abstract

A key issue in machine learning research is the lack of reproducibility. We illustrate what role hyperparameter search plays in this problem and how regular hyperparameter search methods can lead to a large variance in outcomes due to nondeterministic model training during hyperparameter optimization. The variation in outcomes poses a problem both for reproducibility of the hyperparameter search itself and comparisons of different methods each optimized using hyperparameter search. In addition, the fact that hyperparameter search may result in nonoptimal hyperparameter settings may affect other studies, since hyperparameter settings are often copied from previously published research. To remedy this issue, we define the mean prediction error across model training runs as the objective for the hyperparameter search. We then propose a hypothesis testing procedure that makes inference on the mean performance of each hyperparameter setting and results in an equivalence class of hyperparameter settings that are not distinguishable in performance. We further embed this procedure into a group sequential testing framework to increase efficiency in terms of the average number of model training replicates required. Empirical results on machine learning benchmarks show that at equal computation the proposed method reduces the variation in hyperparameter search outcomes by up to 90% while resulting in equal or lower mean prediction errors when compared to standard random search and Bayesian optimization. Moreover, the sequential testing framework successfully reduces computation while preserving performance of the method. The code to reproduce the results is available online and in the supplementary materials.

Full Text