Abstract

Entity resolution is a fundamental task in data integration. Recent studies of this problem, including active learning, crowdsourcing, and pay-as-you-go approaches, have started to involve human users in the loop to carry out interactive entity resolution tasks, namely to invite human users to judge whether two entity descriptions refer to the same real-world entity. This process of judgment requires tool support, particularly when entity descriptions contain a large number of features (i.e. property-value pairs). To facilitate judgment, in this article, we propose to select, from entity descriptions, a subset of critical features as a summary to be shown and judged by human users. Features preferred to be selected are those that reflect the most commonalities shared by and the most conflicts between the two entities, and that carry the largest amount of characteristic and diverse information about them. Selected features are then grouped and ordered to improve readability and further speed up judgment. Experimental results demonstrate that summaries generated by our method help users judge more efficiently (3.57–3.78 times faster) than entire entity descriptions, without significantly hurting the accuracy of judgment. The accuracy achieved by our method is also higher than those achieved by existing summarization methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call