Label distribution‐based noise correction for multiclass crowdsourcing

Ziqi Chen,Chaoqun Li,Liangxiao Jiang

doi:10.1002/int.22812

Abstract

In crowdsourcing scenarios, we can often obtain each instance's multiple noisy labels from different crowd workers and then use a label integration method to infer its integrated label. In spite of the effectiveness of label integration methods, a certain level of label noise still exists in integrated labels. To reduce the impact of label noise, noise correction has attracted much attention from researchers, and therefore a certain number of noise correction methods were proposed in recent years. Among them, Between-class Margin-based Noise Correction (BMNC) has demonstrated remarkable denoising performance. However, BMNC can only handle binary classification tasks. For multiclass classification tasks, we propose an effective but very simple noise correction method in this paper. We refer to our proposed method as Label Distribution-based Noise Correction (LDNC). At first, LDNC transforms each instance's multiple noisy labels into a label distribution. Then, LDNC uses the margin between the first- and second-largest label probabilities in the label distribution to identify and filter each possible noise instance and thus obtains a clean set and a noise set. Finally, LDNC builds a classifier on the clean set to relabel all instances in the noise set. The experimental results on 16 simulated and one real-world multiclass crowdsourced data sets show that LDNC significantly outperforms all the other existing state-of-the-art noise correction methods.

Full Text