Abstract

In crowdsourcing learning, a certain degree of label noise still exists in the integrated labels, which are obtained by label integration algorithms. In order to decrease the impact of label noise, many scholars focus on noise correction algorithms. Existing noise correction algorithms usually divide a data set with integrated labels into a clean set and a noise set, directly. However, there are certain noise instances in the clean set and clean instances in the noise set. To reduce the interference from noise instances and improve the utilization of clean instances, inspired by the three-way decision theory, we propose a three-way decision-based noise correction (TDNC) algorithm. Firstly, we calculate the compactness of each instance's multiple noisy label set and estimate the label certainty of each instance's integrated label. Secondly, we divide a data set with integrated labels into three disjoint sets, called positive set, boundary set and negative set, respectively. Thirdly, we further divide and add the instances in the boundary set into the positive set and the negative set according to the nearest neighbor classification. Finally, we train two heterogeneous classifiers on the new positive set to correct the noise instances in the new negative set using a consensus voting strategy. The experimental results on 34 simulated and two real-world data sets consistently show that TDNC significantly outperforms the existing state-of-the-art noise correction algorithms in terms of the noise ratio.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.