A Lightweight, Effective, and Efficient Model for Label Aggregation in Crowdsourcing

Yi Yang,Xingrui Zhuo,Quan Bai,Gongqing Wu,Zhong-Qiu Zhao,Weihua Li,Qing Liu

doi:10.1145/3630102

Yi Yang, Xingrui Zhuo + Show 5 more

Open Access

PDF Available

https://doi.org/10.1145/3630102

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Due to the presence of noise in crowdsourced labels, label aggregation (LA) has become a standard procedure for post-processing these labels. LA methods estimate true labels from crowdsourced labels by modeling worker quality. However, most existing LA methods are iterative in nature. They require multiple passes through all crowdsourced labels, jointly and iteratively updating true labels and worker qualities until a termination condition is met. As a result, these methods are burdened with high space and time complexities, which restrict their applicability in scenarios where scalability and online aggregation are essential. Furthermore, defining a suitable termination condition for iterative algorithms can be challenging. In this article, we view LA as a dynamic system and represent it as a Dynamic Bayesian Network. From this dynamic model, we derive two lightweight and scalable algorithms: LAonepassand LAtwopass. These algorithms can efficiently and effectively estimate worker qualities and true labels by traversing all labels at most twice, thereby eliminating the need for explicit termination conditions and multiple traversals over the crowdsourced labels. Due to their dynamic nature, the proposed algorithms are also capable of performing label aggregation online. We provide theoretical proof of the convergence property of the proposed algorithms and bound the error of the estimated worker qualities. Furthermore, we analyze the space and time complexities of our proposed algorithms, demonstrating their equivalence to those of majority voting. Through experiments conducted on 20 real-world datasets, we demonstrate that our proposed algorithms can effectively and efficiently aggregate labels in both offline and online settings, even though they traverse all labels at most twice. The code is onhttps://github.com/yyang318/LA_onepass.

Full Text