Resampling-based noise correction for crowdsourcing

Wenqiang Xu,Liangxiao Jiang,Chaoqun Li

doi:10.1080/0952813x.2020.1806519

Abstract

ABSTRACT Crowdsourcing services provide an economic and efficient means of acquiring multiple noisy labels for each training instance in supervised learning. Ground truth inference methods, also known as consensus methods, are then used to obtain the integrated labels of training instances. Although consensus methods are effective, there still exists a level of noise in the set of integrated labels. Therefore, it is necessary to handle noise in the integrated labels to improve label and model quality. In this paper, we propose a resampling-based noise correction method (simply RNC). Different from previous label noise correction methods for crowdsourcing, RNC first employs a filter to obtain a clean set and a noisy set and then repeatedly resamples the clean and noisy sets several times according to a certain proportion. Finally, multiple classifiers built on the resampled data sets are used to re-label the training data. Experimental results based on 18 simulated data sets and five real-world data sets demonstrate that RNC rarely degrades the label and model quality compared to other three state-of-the-art noise correction methods and, in many cases, improves quality dramatically.

Full Text