Abstract

In label-noise learning, estimating the transition matrix is a hot topic as the matrix plays an important role in building statistically consistent classifiers. Traditionally, the transition from clean labels to noisy labels (i.e., clean-label transition matrix (CLTM)) has been widely exploited on class-dependent label-noise (wherein all samples in a clean class share the same label transition matrix) to learn a clean-label classifier by employing the noisy data. However, the CLTM cannot handle the more common instance-dependent label-noise well (wherein the clean-to-noisy label transition matrix needs to be estimated at the instance level by considering the input quality) since the instance-dependent CLTM estimation requires to collect a set of clean labels from the noisy data distribution, which is difficult to achieve because the clean labels have uncertainty. Motivated by the fact that classifiers mostly output Bayes optimal labels for prediction, in this paper, we study to directly model the transition from Bayes optimal labels to noisy labels (i.e., Bayes-Label Transition Matrix (BLTM)) and learn a classifier to predict Bayes optimal labels. Note that given only noisy data, it is ill-posed to estimate either the CLTM or the BLTM. But favorably, Bayes optimal labels have no uncertainty compared with the clean labels, i.e., the class posteriors of Bayes optimal labels are one-hot vectors while those of clean labels are not. This enables two advantages to estimate the BLTM, i.e., (a) a set of examples with theoretically guaranteed Bayes optimal labels can be collected out of noisy data; (b) the feasible solution space is much smaller. By exploiting the advantages, this work proposes a parametrical model for estimating the instance-dependent label-noise transition matrix by employing a deep neural network, leading to better generalization and superior classification performance. From the theoretical perspective, we prove that by leveraging the instance-dependent Bayes-Label Transition Matrix, the classifier learned on the noisy data distribution would converge to the Bayes optimal classifier defined on the clean data distribution with an optimal parametric convergence rate for the empirical risk minimization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call