In crowdsourcing learning, label integration is often used to infer instances’ integrated labels from their multiple noisy labels. However, almost all existing label integration algorithms apply the same strategy to infer different instances’ integrated labels, which limits their performance. This paper argues that different instances should enjoy different label integration strategies alone. Thanks to the three-way decision theory, a three-way decision-based label integration (TDLI) algorithm is proposed. In TDLI, we at first evaluate the label qualities of each instance and its K-nearest neighbors, and then utilize them to divide the whole crowdsourced datasets into three disjoint subsets, called positive set, boundary set and negative set, respectively. For each instance in the positive set, we directly apply the simplest majority voting (MV) to infer its integrated label. For each instance in the boundary set, we absorb its K-nearest neighbors’ multiple noisy labels to infer its integrated label by the weighted MV. For each instance in the negative set, we merge the positive and boundary sets to train a classifier to infer its integrated label by fusing its own multiple noisy label distribution and the predicted label distribution. Extensive experiments demonstrate that TDLI distinctly outperforms all the other existing label integration algorithms used to compare.
Read full abstract