Ensemble Learning With Manifold-Based Data Splitting for Noisy Label Correction

Hsin-Chieh Wang,Weng-Tai Su,Hao-Chiang Shao,Chia-Wen Lin

doi:10.1109/tmm.2021.3119861

Abstract

Label noise in training data can significantly degrade a model’s generalization performance for supervised learning tasks. Here we focus on the problem that noisy labels are primarily caused by mislabeled confusing samples, which tend to be concentrated near decision boundaries rather than uniformly distributed, and whose features should be equivocal. To address the problem, we propose an ensemble learning method to correct noisy labels by exploiting the local structures of feature manifolds. Different from typical ensemble strategies that increase the prediction diversity among sub-models via certain loss terms, our method trains sub-models on disjoint subsets, each being a union of randomly selected seed samples’ nearest-neighbors of the same class on the data manifold. As a result, only a limited number of sub-models will be affected by locally-concentrated noisy labels, and each sub-model can learn a coarse representation of the data manifold along with a corresponding graph. The constructed graphs are used to suggest a set of label correction candidates, and accordingly, our method determines label correction results by majority decisions. Our experiments on real-world noisy label datasets demonstrate the superiority of the proposed method over existing state-of-the-arts.

Full Text