In crowdsourcing learning, a certain degree of label noise still exists in the integrated labels, which are obtained by label integration algorithms. In order to decrease the impact of label noise, many scholars focus on noise correction algorithms. Existing noise correction algorithms usually divide a data set with integrated labels into a clean set and a noise set, directly. However, there are certain noise instances in the clean set and clean instances in the noise set. To reduce the interference from noise instances and improve the utilization of clean instances, inspired by the three-way decision theory, we propose a three-way decision-based noise correction (TDNC) algorithm. Firstly, we calculate the compactness of each instance's multiple noisy label set and estimate the label certainty of each instance's integrated label. Secondly, we divide a data set with integrated labels into three disjoint sets, called positive set, boundary set and negative set, respectively. Thirdly, we further divide and add the instances in the boundary set into the positive set and the negative set according to the nearest neighbor classification. Finally, we train two heterogeneous classifiers on the new positive set to correct the noise instances in the new negative set using a consensus voting strategy. The experimental results on 34 simulated and two real-world data sets consistently show that TDNC significantly outperforms the existing state-of-the-art noise correction algorithms in terms of the noise ratio.
Read full abstract