ERM and RERM are optimal estimators for regression problems when malicious outliers corrupt the labels

Geoffrey Chinot

doi:10.1214/20-ejs1754

Abstract

We study Empirical Risk Minimizers (ERM) and Regularized Empirical Risk Minimizers (RERM) for regression problems with convex and $L$-Lipschitz loss functions. We consider a setting where $|{\mathcal{O}}|$ malicious outliers contaminate the labels. In that case, under a local Bernstein condition, we show that the $L_{2}$-error rate is bounded by $r_{N}+AL|{\mathcal{O}}|/N$, where $N$ is the total number of observations, $r_{N}$ is the $L_{2}$-error rate in the non-contaminated setting and $A$ is a parameter coming from the local Bernstein condition. When $r_{N}$ is minimax-rate-optimal in a non-contaminated setting, the rate $r_{N}+AL|{\mathcal{O}}|/N$ is also minimax-rate-optimal when $|{\mathcal{O}}|$ outliers contaminate the label. The main results of the paper can be used for many non-regularized and regularized procedures under weak assumptions on the noise. We present results for Huber’s M-estimators (without penalization or regularized by the $\ell _{1}$-norm) and for general regularized learning problems in reproducible kernel Hilbert spaces when the noise can be heavy-tailed.

Highlights

Let (Ω, A, P ) be a probability space where Ω = X ×Y
Up to a logarithmic factor, the Regularized Empirical Risk Minimizers (RERM) associated with the Huber loss function for the problem or sparse-linear regression is minimax-rate-optimal when |O| malicious outliers corrupt the labels
As exposed in [19], in a setting where |O| outliers contaminate only the labels, RERM with the Huber loss function is minimax-rate-optimal for the sparse-regression problem when the noise and design of non-contaminated data are both Gaussian. It leads to the following questions: 1. Is the (R)Empirical Risk Minimizer (ERM) optimal for other loss functions and regression problems than the sparse-regression when malicious outliers corrupt the labels?

Summary

Introduction

Let (X, Y ) be a random variable taking values in Ω with joint distribution P and let μ be the marginal distribution of X. Let F denote a class of functions f : X → Y. For any function f in F we write f (x, y) := (f (x), y). For any distribution Q on Ω and any function f : X × Y → R we write Qf = E(X,Y )∼Q[f (X, Y )]. Let f ∈ F , the risk of f is defined as R(f ) := P f = E(X,Y )∼P [ (f (X), Y )]. A prediction function with minimal risk is called an oracle and is defined as f ∗ ∈ argminf∈F P f. Instead one is given a dataset D = (Xi, Yi)Ni=1 of N random variables taking values in X × Y

Chinot

Setting

Our contributions

Is the Gaussian assumption on the noise necessary?

Related literature

Complexity measures and parameters

Local Bernstein conditions and main results

High dimensional setting

Complexity parameters and sparsity equation

Application to 1-penalized Huber’s M-estimator with Gaussian design

Application to RKHS with the huber loss function

Conclusion and perspectives

Proof Theorem 1 in the sub-Gaussian setting

Proof Theorem 1 in the local bounded framework

Proof Theorem 3 in the sub-Gaussian framework

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2020
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

ERM and RERM are optimal estimators for regression problems when malicious outliers corrupt the labels

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Similar Papers

Robust statistical learning with Lipschitz and convex loss functions
Geoffrey Chinot ... Guillaume Lecué
Probability Theory and Related Fields | VOL. 176
Geoffrey Chinot, et. al.Geoffrey Chinot ... Guillaume Lecué
02 Jul 2019
Probability Theory and Related Fields | VOL. 176

Robust classification via MOM minimization
Guillaume Lecué ... Matthieu Lerasle
Machine Learning | VOL. 109
Guillaume Lecué, et. al.Guillaume Lecué ... Matthieu Lerasle
27 Apr 2020
Machine Learning | VOL. 109

Variability regularization in large-margin classification
Dwi Sianto Mansjur ... Biing-Hwang Juang
-
Dwi Sianto Mansjur, et. al.Dwi Sianto Mansjur ... Biing-Hwang Juang
01 May 2011
01 May 2011

Comparison of ANOVA-F and ANOM tests with regard to type I error rate and test power
Mehmet Mendeş ... Soner Yiğit
Journal of Statistical Computation and Simulation | VOL. 83
Mehmet Mendeş, et. al.Mehmet Mendeş ... Soner Yiğit
08 May 2012
Journal of Statistical Computation and Simulation | VOL. 83

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ERM and RERM are optimal estimators for regression problems when malicious outliers corrupt the labels

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics