Abstract

In crowdsourcing scenarios, we can obtain each instance’s multiple noisy label set from different workers and then use a ground truth inference algorithm to infer its integrated label. Despite the effectiveness of ground truth inference algorithms, there is still a certain level of noise in integrated labels. To reduce the impact of noise, many noise correction algorithms have been proposed in recent years. To the best of our knowledge, almost all these algorithms assume that workers have the same labeling certainty on different classes and instances. However, it is rarely true in reality due to the differences in workers’ individual preferences and cognitive abilities. In this paper, we argue that the labeling certainty of a worker should be class-dependent and instance-dependent. Based on this premise, we propose a certainty weighted voting-based noise correction (CWVNC) algorithm. At first, we use the consistency between worker-labeled labels and integrated labels on different classes to estimate the class-dependent certainty. Then, we train a probability-based classifier on the instances labeled by each worker separately and use it to estimate the instance-dependent certainty. Finally, we correct the integrated label of each instance by weighted voting based on class-dependent certainty and instance-dependent certainty. When the proposed algorithm CWVNC is examined, the average noise ratio of CWVNC on 34 simulated datasets is equal to 15.08%, and on two real-world datasets “Income” and “Music_genre” the noise ratio is equal to 25.77% and 26.94%, respectively. The results show that CWVNC significantly outperforms all other state-of-the-art noise correction algorithms used for comparison.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call