An important stage of most state-of-the-art (SOTA) noisy-label learning methods consists of a sample selection procedure that classifies samples from the noisy-label training set into noisy-label or clean-label subsets. The process of sample selection typically consists of one of the two approaches: loss-based sampling, where high-loss samples are considered to have noisy labels, or feature-based sampling, where samples from the same class tend to cluster together in the feature space and noisy-label samples are identified as anomalies within those clusters. Empirically, loss-based sampling is robust to a wide range of noise rates, while feature-based sampling tends to work effectively in particular scenarios, e.g., the filtering of noisy instances via their eigenvectors (FINE) sampling exhibits greater robustness in scenarios with low noise rates, and the K nearest neighbour (KNN) sampling mitigates better high noise-rate problems. This paper introduces the Adaptive Nearest Neighbours and Eigenvector-based (ANNE) sample selection methodology, a novel approach that integrates loss-based sampling with the feature-based sampling methods FINE and Adaptive KNN to optimize performance across a wide range of noise rate scenarios. ANNE achieves this integration by first partitioning the training set into high-loss and low-loss sub-groups using loss-based sampling. Subsequently, within the low-loss subset, sample selection is performed using FINE, while the high-loss subset employs Adaptive KNN for effective sample selection. We integrate ANNE into the noisy-label learning state of the art (SOTA) method SSR+, and test it on CIFAR-10/-100 (with symmetric, asymmetric and instance-dependent noise), Webvision and ANIMAL-10, where our method shows better accuracy than the SOTA in most experiments, with a competitive training time. The code is available at https://github.com/filipe-research/anne.
Read full abstract