Abstract
Noise label exists widely in real-world data, resulting in the degradation of classification performance. Popular methods require a known noise distribution or additional cleaning supervision, which is usually unavailable in practical scenarios. This paper presents a theoretical statistical method and designs a label confidence inference (LISR) algorithm to handle this issue. For data distribution, we define a statistical function for label inconsistency and analyze its relationship with neighbor radius. For data representation, we define trusted-neighbor, nearest-trusted-neighbor and untrusted-neighbor. For noisy label recognition, we present three inference methods to predict the labels and their confidence. The LISR algorithm establishes a practical statistical model, queries the initial trusted instances, iteratively searches for the trusted instances and corrects labels. We conducted experiments on synthetic, UCI and classic image datasets. The results of significance test verified the effectiveness of LISR and its superiority to the state-of-the-art noise label learning algorithms.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have