Abstract
We study Empirical Risk Minimizers (ERM) and Regularized Empirical Risk Minimizers (RERM) for regression problems with convex and $L$-Lipschitz loss functions. We consider a setting where $|{\mathcal{O}}|$ malicious outliers contaminate the labels. In that case, under a local Bernstein condition, we show that the $L_{2}$-error rate is bounded by $r_{N}+AL|{\mathcal{O}}|/N$, where $N$ is the total number of observations, $r_{N}$ is the $L_{2}$-error rate in the non-contaminated setting and $A$ is a parameter coming from the local Bernstein condition. When $r_{N}$ is minimax-rate-optimal in a non-contaminated setting, the rate $r_{N}+AL|{\mathcal{O}}|/N$ is also minimax-rate-optimal when $|{\mathcal{O}}|$ outliers contaminate the label. The main results of the paper can be used for many non-regularized and regularized procedures under weak assumptions on the noise. We present results for Huber’s M-estimators (without penalization or regularized by the $\ell _{1}$-norm) and for general regularized learning problems in reproducible kernel Hilbert spaces when the noise can be heavy-tailed.
Highlights
Let (Ω, A, P ) be a probability space where Ω = X ×Y
Up to a logarithmic factor, the Regularized Empirical Risk Minimizers (RERM) associated with the Huber loss function for the problem or sparse-linear regression is minimax-rate-optimal when |O| malicious outliers corrupt the labels
As exposed in [19], in a setting where |O| outliers contaminate only the labels, RERM with the Huber loss function is minimax-rate-optimal for the sparse-regression problem when the noise and design of non-contaminated data are both Gaussian. It leads to the following questions: 1. Is the (R)Empirical Risk Minimizer (ERM) optimal for other loss functions and regression problems than the sparse-regression when malicious outliers corrupt the labels?
Summary
Let (X, Y ) be a random variable taking values in Ω with joint distribution P and let μ be the marginal distribution of X. Let F denote a class of functions f : X → Y. For any function f in F we write f (x, y) := (f (x), y). For any distribution Q on Ω and any function f : X × Y → R we write Qf = E(X,Y )∼Q[f (X, Y )]. Let f ∈ F , the risk of f is defined as R(f ) := P f = E(X,Y )∼P [ (f (X), Y )]. A prediction function with minimal risk is called an oracle and is defined as f ∗ ∈ argminf∈F P f. Instead one is given a dataset D = (Xi, Yi)Ni=1 of N random variables taking values in X × Y
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.