CrowdK: Answering top-k queries with crowdsourcing

Jongwuk Lee,Dongwon Lee,Seung-Won Hwang

doi:10.1016/j.ins.2017.03.010

Abstract

In recent years, crowdsourcing has emerged as a new computing paradigm for bridging the gap between human- and machine-based computation. As one of the core operations in data retrieval, we study top-k queries with crowdsourcing, namely crowd-enabled top-k queries. This problem is formulated with three key factors, latency, monetary cost, and quality of answers. We first aim to design a novel framework that minimizes monetary cost when latency is constrained. Toward this goal, we employ a two-phase parameterized framework with two parameters, called buckets and ranges. On top of this framework, we develop three methods: greedy, equi-sized, and dynamic programming, to determine the buckets and ranges. By combining the three methods at each phase, we propose four algorithms: GdyBucket, EquiBucket, EquiRange, and CrowdK. When the crowd answers are imprecise, we also address improving the accuracy of the top-k answers. Lastly, using both simulated crowds and real crowds at Amazon Mechanical Turk, we evaluate the trade-off between our proposals with respect to monetary cost, accuracy of answers, and running time. Compared to other competitive algorithms, it is found that CrowdK reduces monetary cost up to 20 times, without sacrificing the accuracy of the top-k answers.

Full Text