Inferring ground truth from crowdsourced data under local attribute differential privacy

Di Wang,Jinhui Xu

doi:10.1016/j.tcs.2021.02.039

Abstract

Nowadays, crowdsourcing gains an increasing popularity as it can be adopted to solve many challenging question answering tasks that are easy for humans but difficult for computers. Due to the variety in the quality of users, it is important to infer not only the underlying ground truth of these tasks but also the users ability from the answers given by users. This problem is called Ground Truth Inference and has been studied for many years. However, since the answers collected from the users may contain sensitive information, ground truth inference raises serious privacy concern. Due to this reason, the problem of ground truth inference under local differential privacy (LDP) model has been recently studied. However, this problem is still not well understood and even some basic questions have not been solved yet. First, it is still unknown what is the average error of the private estimators to the underlying ground truth. Secondly, we do not know whether we can infer the ability of each user under LDP model and what is the estimation error w.r.t. the underlying users ability. Finally, previous work only shows that their methods have better performance than the private major voting algorithm through experiments. However, there is still no theoretically result which shows this priority formally or mathematically. In this paper, we partially solve these problems by studying the ground truth inference problem under local attribute differential privacy (LADP) model, which is a relaxation of LDP model, and propose a new algorithm called private Dawid-Skene method, which is motivated by the classical Dawid-Skene method. Specifically, we first provide the estimation errors for both ability of users and the ground truth under some assumptions of the problem if the algorithm start with some appropriate initial vector. Moreover, we propose an explicit instance and show that the estimation error of the ground truth achieved by the private major voting algorithm is always greater than the error achieved by our method.

Full Text