Abstract
Hyperparameter tuning and model selection are important steps in machine learning. Unfortunately, classical hyperparameter calibration and model selection procedures are sensitive to outliers and heavy-tailed data. In this work, we construct a selection procedure which can be seen as a robust alternative to cross-validation and is based on a median-of-means principle. Using this procedure, we also build an ensemble method which, trained with algorithms and corrupted heavy-tailed data, selects an algorithm, trains it with a large uncorrupted subsample and automatically tunes its hyperparameters. In particular, the approach can transform any procedure into a robust to outliers and to heavy-tailed data procedure while tuning automatically its hyperparameters. The construction relies on a divide-and-conquer methodology, making this method easily scalable even on a corrupted dataset. This method is tested with the LASSO which is known to be highly sensitive to outliers.
Highlights
Robustness has become an important subject of interest in the machine learning community over the last few years because large datasets are very likely to be corrupted
Robust alternatives to empirical risk minimizers and their penalized/regularized versions have been studied in density estimation [5] and least-squares regression [4, 36, 20, 50, 55]
To compute the minmax-MOM selection procedure in the context of the ensemble method defined in Section 4.1, the empirical risk of each estimator fm has to be computed on the 2K0 -partition only, which thanks to (4.4) means the computation of at most 8V |M|/3 empirical risks, as advertised
Summary
Robustness has become an important subject of interest in the machine learning community over the last few years because large datasets are very likely to be corrupted. Even if some candidate estimators are robust, outliers from the test set may mislead the selection/aggregation step, resulting in a poor final estimator. This raises the question of a robust selection/aggregation procedure, which is addressed in the present work. Theoretical guarantees for the latter are given in Theorem 3.2. The proofs are outsourced in the appendix in Appendices A and B
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have