Abstract

In this paper, we introduce a novel framework for entity resolution blocking, called skyblocking, which aims to learn scheme skylines. In this skyblocking framework, each blocking scheme is mapped as a point to a multi-dimensional scheme space where each blocking measure represents one dimension. A scheme skyline contains blocking schemes that are not dominated by any other blocking schemes in the scheme space. To efficiently learn scheme skylines, two challenges exist: one is the class imbalance problem and the other is the search space problem. We tackle these two challenges by developing an active sampling strategy and a scheme extension strategy. Based on these two strategies, we develop three scheme skyline learning algorithms for efficiently learning scheme skylines under a given number of blocking measures and within a label budget limit. We experimentally verify that our algorithms outperform the baseline approaches in all of the following aspects: label efficiency, blocking quality and learning efficiency, over five real-world datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call