The Power of Localization for Efficiently Learning Linear Separators with Noise

Pranjal Awasthi,Philip M Long,Maria Florina Balcan

doi:10.1145/3006384

Abstract

We introduce a new approach for designing computationally efficient learning algorithms that are tolerant to noise, and we demonstrate its effectiveness by designing algorithms with improved noise tolerance guarantees for learning linear separators. We consider both the malicious noise model of Valiant [1985] and Kearns and Li [1988] and the adversarial label noise model of Kearns, Schapire, and Sellie [1994]. For malicious noise, where the adversary can corrupt both the label and the features, we provide a polynomial-time algorithm for learning linear separators in ℜ d under isotropic log-concave distributions that can tolerate a nearly information-theoretically optimal noise rate of η = Ω(ϵ), improving on the Ω (ϵ 3 /log 2 ( d/ϵ )) noise-tolerance of Klivans et al. [2009a]. In the case that the distribution is uniform over the unit ball, this improves on the Ω (ϵ/ d 1/4 ) noise-tolerance of Kalai et al. [2005] and the Ω (ϵ 2 /log(d/ϵ)) of Klivans et al. [2009a]. For the adversarial label noise model, where the distribution over the feature vectors is unchanged and the overall probability of a noisy label is constrained to be at most η, we also give a polynomial-time algorithm for learning linear separators in ℜ d under isotropic log-concave distributions that can handle a noise rate of η = Ω(ϵ). In the case of uniform distribution, this improves over the results of Kalai et al. [2005], which either required runtime super-exponential in 1/ϵ (ours is polynomial in 1/ϵ) or tolerated less noise. 1 Our algorithms are also efficient in the active learning setting, where learning algorithms only receive the classifications of examples when they ask for them. We show that, in this model, our algorithms achieve a label complexity whose dependence on the error parameter ϵ is polylogarithmic (and thus exponentially better than that of any passive algorithm). This provides the first polynomial-time active learning algorithm for learning linear separators in the presence of malicious noise or adversarial label noise. Our algorithms and analysis combine several ingredients including aggressive localization, minimization of a progressively rescaled hinge loss, and a novel localized and soft outlier removal procedure. We use localization techniques (previously used for obtaining better sample complexity results) to obtain better noise-tolerant polynomial-time algorithms.

Full Text