A Novel Class Noise Estimation Method and Application in Classification

Lin Gui,Ruifeng Xu,Qin Lu,Minglei Li,Qikang Wei

doi:10.1145/2806416.2806554

Lin Gui, Ruifeng Xu + Show 3 more

https://doi.org/10.1145/2806416.2806554

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Noise in class labels of any training set can lead to poor classification results no matter what machine learning method is used. In this paper, we first present the problem of binary classification in the presence of random noise on the class labels, which we call class noise. To model class noise, a class noise rate is normally defined as a small independent probability of the class labels being inverted on the whole set of training data. In this paper, we propose a method to estimate class noise rate at the level of individual samples in real data. Based on the estimation result, we propose two approaches to handle class noise. The first technique is based on modifying a given surrogate loss function. The second technique eliminates class noise by sampling. Furthermore, we prove that the optimal hypothesis on the noisy distribution can approximate the optimal hypothesis on the clean distribution using both approaches. Our methods achieve over 87% accuracy on a synthetic non-separable dataset even when 40% of the labels are inverted. Comparisons to other algorithms show that our methods outperform state-of-the-art approaches on several benchmark datasets in different domains with different noise rates.

Full Text