Crowdsourcing systems provide an easy way to obtain labels for data. Each instance in data will usually be labeled by multiple crowd labelers who are not experts. Thus, it is very important to design considerate ground truth inference algorithms to infer integrated labels from multiple crowd labels. While almost all ground truth inference algorithms show good performance when the number of crowd labels is large, few algorithms can perform well with few crowd labels. This paper considers how to deal with noise in multiple crowd labels as a key to good ground truth inference. This paper solves ground truth inference using robust classifiers. This paper proposes two versions of ground truth inference algorithm based on robust logistic regression to solve the following two problems: (1) how to embed noise level into the loss function of logistic regression and (2) how to estimate the parameters that model noise level in the crowdsourcing scenario. We call our algorithms robust logistic regression inference (RLRI). By employing the idea of robust classifiers, RLRI can still perform well in the case of a small number of labels. We also theoretically compare the advantages and disadvantages of the two versions of RLRI. Finally, the performance of our algorithms is verified on benchmark and real-world datasets.
Read full abstract