AutoBlock

Wei Zhang,Christos Faloutsos,Bunyamin Sisman,Xin Luna Dong,Hao Wei,Davd Page

doi:10.1145/3336191.3371813

Abstract

Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record pairs to be matched. However, most of the traditional blocking methods are learning-free and key-based, and their successes are largely built on laborious human effort in cleaning data and designing blocking keys. In this paper, we propose AutoBlock, a novel hands-off blocking framework for entity matching, based on similarity-preserving representation learning and nearest neighbor search. Our contributions include: (a) Automation: AutoBlock frees users from laborious data cleaning and blocking key tuning. (b) Scalability: AutoBlock has a sub-quadratic total time complexity and can be easily deployed for millions of records. (c) Effectiveness: AutoBlock outperforms a wide range of competitive baselines on multiple large-scale, real-world datasets, especially when datasets are dirty and/or unstructured.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AutoBlock

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Efficient Entity Matching over Multiple Data Sources with MapReduce
...
Journal of Information and Data Management | VOL. 5
, et. al. ...
13 Jul 2014
Journal of Information and Data Management | VOL. 5

BUBBLE : A Quality-Aware Human-in-the-loop Entity Matching Framework
Naofumi Osawa ... Atsuyuki Morishima
-
Naofumi Osawa, et. al.Naofumi Osawa ... Atsuyuki Morishima
15 Dec 2021
15 Dec 2021

Deep Entity Matching
Yuliang Li ... Yoshihiko Suhara
Journal of Data and Information Quality | VOL. 13
Yuliang Li, et. al.Yuliang Li ... Yoshihiko Suhara
06 Jan 2021
Journal of Data and Information Quality | VOL. 13

Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning
Chen Zhao ... Yeye He
-
Chen Zhao, et. al.Chen Zhao ... Yeye He
13 May 2019
13 May 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AutoBlock

Abstract

Talk to us

Similar Papers