A MOM-based ensemble method for robustness, subsampling and hyperparameter tuning

Joon Kwon,Guillaume Lecué,Matthieu Lerasle

doi:10.1214/21-ejs1814

Abstract

Hyperparameter tuning and model selection are important steps in machine learning. Unfortunately, classical hyperparameter calibration and model selection procedures are sensitive to outliers and heavy-tailed data. In this work, we construct a selection procedure which can be seen as a robust alternative to cross-validation and is based on a median-of-means principle. Using this procedure, we also build an ensemble method which, trained with algorithms and corrupted heavy-tailed data, selects an algorithm, trains it with a large uncorrupted subsample and automatically tunes its hyperparameters. In particular, the approach can transform any procedure into a robust to outliers and to heavy-tailed data procedure while tuning automatically its hyperparameters. The construction relies on a divide-and-conquer methodology, making this method easily scalable even on a corrupted dataset. This method is tested with the LASSO which is known to be highly sensitive to outliers.

Highlights

Robustness has become an important subject of interest in the machine learning community over the last few years because large datasets are very likely to be corrupted
Robust alternatives to empirical risk minimizers and their penalized/regularized versions have been studied in density estimation [5] and least-squares regression [4, 36, 20, 50, 55]
To compute the minmax-MOM selection procedure in the context of the ensemble method defined in Section 4.1, the empirical risk of each estimator fm has to be computed on the 2K0 -partition only, which thanks to (4.4) means the computation of at most 8V |M|/3 empirical risks, as advertised

Summary

Introduction

Robustness has become an important subject of interest in the machine learning community over the last few years because large datasets are very likely to be corrupted. Even if some candidate estimators are robust, outliers from the test set may mislead the selection/aggregation step, resulting in a poor final estimator. This raises the question of a robust selection/aggregation procedure, which is addressed in the present work. Theoretical guarantees for the latter are given in Theorem 3.2. The proofs are outsourced in the appendix in Appendices A and B

Setting

Minmax-MOM selection: a robust alternative to cross-validation

Definition of the method

Theoretical guarantees

An efficient partition scheme of the dataset

Application to fine-tuning the regularization parameter of the LASSO

Application to ERM and linear aggregation

Presentation

On the choices of V and Kmax

Results and discussion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A MOM-based ensemble method for robustness, subsampling and hyperparameter tuning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic journal of statistics

Lead the way for us

Journal: Electronic journal of statistics	Publication Date: Jan 1, 2021
License type: cc-by

Similar Papers

Population-Based Hyperparameter Tuning With Multitask Collaboration.
Wendi Li ... Wing W Y Ng
IEEE transactions on neural networks | VOL. PP
Wendi Li, et. al.Wendi Li ... Wing W Y Ng
01 Sep 2023
IEEE transactions on neural networks | VOL. PP

EEG-based Image Classification using Machine Learning Algorithms
Jahnavi Kachhia ... Kiran George
-
Jahnavi Kachhia, et. al.Jahnavi Kachhia ... Kiran George
27 Jan 2021
27 Jan 2021

An Efficient Method for Noisy Annotation Data Modeling
Sushama Shinde ... Shyam Gupta
IOSR journal of computer engineering | VOL. 16
Sushama Shinde, et. al.Sushama Shinde ... Shyam Gupta
01 Jan 2014
IOSR journal of computer engineering | VOL. 16

Subjectivity in Unsupervised Machine Learning Model Selection
Wanyi Chen ... Mary Cummings
Proceedings of the AAAI Symposium Series | VOL. 3
Wanyi Chen, et. al.Wanyi Chen ... Mary Cummings
20 May 2024
Proceedings of the AAAI Symposium Series | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A MOM-based ensemble method for robustness, subsampling and hyperparameter tuning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic journal of statistics