Abstract

We study Empirical Risk Minimizers (ERM) and Regularized Empirical Risk Minimizers (RERM) for regression problems with convex and $L$-Lipschitz loss functions. We consider a setting where $|{\mathcal{O}}|$ malicious outliers contaminate the labels. In that case, under a local Bernstein condition, we show that the $L_{2}$-error rate is bounded by $r_{N}+AL|{\mathcal{O}}|/N$, where $N$ is the total number of observations, $r_{N}$ is the $L_{2}$-error rate in the non-contaminated setting and $A$ is a parameter coming from the local Bernstein condition. When $r_{N}$ is minimax-rate-optimal in a non-contaminated setting, the rate $r_{N}+AL|{\mathcal{O}}|/N$ is also minimax-rate-optimal when $|{\mathcal{O}}|$ outliers contaminate the label. The main results of the paper can be used for many non-regularized and regularized procedures under weak assumptions on the noise. We present results for Huber’s M-estimators (without penalization or regularized by the $\ell _{1}$-norm) and for general regularized learning problems in reproducible kernel Hilbert spaces when the noise can be heavy-tailed.

Highlights

  • Let (Ω, A, P ) be a probability space where Ω = X ×Y

  • Up to a logarithmic factor, the Regularized Empirical Risk Minimizers (RERM) associated with the Huber loss function for the problem or sparse-linear regression is minimax-rate-optimal when |O| malicious outliers corrupt the labels

  • As exposed in [19], in a setting where |O| outliers contaminate only the labels, RERM with the Huber loss function is minimax-rate-optimal for the sparse-regression problem when the noise and design of non-contaminated data are both Gaussian. It leads to the following questions: 1. Is the (R)Empirical Risk Minimizer (ERM) optimal for other loss functions and regression problems than the sparse-regression when malicious outliers corrupt the labels?

Read more

Summary

Introduction

Let (X, Y ) be a random variable taking values in Ω with joint distribution P and let μ be the marginal distribution of X. Let F denote a class of functions f : X → Y. For any function f in F we write f (x, y) := (f (x), y). For any distribution Q on Ω and any function f : X × Y → R we write Qf = E(X,Y )∼Q[f (X, Y )]. Let f ∈ F , the risk of f is defined as R(f ) := P f = E(X,Y )∼P [ (f (X), Y )]. A prediction function with minimal risk is called an oracle and is defined as f ∗ ∈ argminf∈F P f. Instead one is given a dataset D = (Xi, Yi)Ni=1 of N random variables taking values in X × Y

Chinot
Setting
Our contributions
Is the Gaussian assumption on the noise necessary?
Related literature
Complexity measures and parameters
Local Bernstein conditions and main results
High dimensional setting
Complexity parameters and sparsity equation
Application to 1-penalized Huber’s M-estimator with Gaussian design
Application to RKHS with the huber loss function
Conclusion and perspectives
Proof Theorem 1 in the sub-Gaussian setting
Proof Theorem 1 in the local bounded framework
Proof Theorem 3 in the sub-Gaussian framework
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.