Abstract

ABSTRACT Crowdsourcing services provide an economic and efficient means of acquiring multiple noisy labels for each training instance in supervised learning. Ground truth inference methods, also known as consensus methods, are then used to obtain the integrated labels of training instances. Although consensus methods are effective, there still exists a level of noise in the set of integrated labels. Therefore, it is necessary to handle noise in the integrated labels to improve label and model quality. In this paper, we propose a resampling-based noise correction method (simply RNC). Different from previous label noise correction methods for crowdsourcing, RNC first employs a filter to obtain a clean set and a noisy set and then repeatedly resamples the clean and noisy sets several times according to a certain proportion. Finally, multiple classifiers built on the resampled data sets are used to re-label the training data. Experimental results based on 18 simulated data sets and five real-world data sets demonstrate that RNC rarely degrades the label and model quality compared to other three state-of-the-art noise correction methods and, in many cases, improves quality dramatically.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.