A demonstration of PERC

Xiangyu Ke,Arijit Khan,Michelle Teo,Vijaya Krishna Yalavarthi

doi:10.14778/3229863.3236225

Abstract

This paper demonstrates PERC --- our system for crowdsourced entity resolution with human errors. Entity Resolution (ER) is a critical step in data cleaning and analytics. Although many machine-based methods existed for ER task, crowdsourcing is becoming increasingly important since humans can provide more insightful information for complex tasks, e.g., clustering of images and natural language processing. However, human workers still make mistakes due to lack of domain expertise or seriousness, ambiguity, or even malicious intent. To this end, we present a system, called PERC (probabilistic entity resolution with crowd errors), which adopts an uncertain graph model to address the entity resolution problem with noisy crowd answers. Using our framework, the problem of ER becomes equivalent to finding the maximum-likelihood clustering. In particular, we propose a novel metric called "reliability" to measure the quality of a clustering, which takes into account both the connected-ness inside and across all clusters. PERC then automatically selects the next question to ask the crowd that maximally increases the "reliability" of the current clustering. This demonstration highlights (1) a reliability-based next crowd-sourcing framework for crowdsourced ER, which does not require any user-defined threshold, and no apriori information about the error rate of the crowd workers, (2) it improves the ER quality by 15% and reduces the crowdsourcing cost by 50% compared to state-of-the-art methods, and (3) its GUI can interact with users to help them compare different crowdsourced ER algorithms, their intermediate ER results as they progress, and their selected next crowdsourcing questions in a user-friendly manner. Our demonstration video is at: https://www.youtube.com/watch?v=rQ7nu3b8zXY.

Full Text