Towards task-based parallelization for entity resolution

Leonardo Gazzarri,Melanie Herschel

doi:10.1007/s00450-019-00409-6

Abstract

Entity resolution (ER) refers to the problem of finding which virtual representations in one or more data sources refer to the same real-world entity. A central question in ER is how to find matching entity representations (so called duplicates) efficiently and in a scalable way. One general technique to address these issues is to leverage parallelization. In particular, almost all work on parallel ER focus on data parallelism. This paper focuses on task parallelism for ER. This type of parallelism allows to support incremental ER that offers incremental computation of the solution by streaming results of intermediate stages of ER as soon as they are computed. This possibly allows to obtain results in a more timely fashion and can also serve in a service-oriented setting with limited time or monetary budget. In summary, this paper presents a framework for task-parallelization of ER, supporting in particular ER of large amounts of semi-structured and heterogeneous data. We also discuss a possible implementation of our framework.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards task-based parallelization for entity resolution

Abstract

Talk to us

Similar Papers

More From: SICS Software-Intensive Cyber-Physical Systems

Lead the way for us

Journal: SICS Software-Intensive Cyber-Physical Systems	Publication Date: Aug 26, 2019
Citations: 2

Similar Papers

Indexing Techniques for Real-Time Entity Resolution

-

01 Mar 2016
01 Mar 2016

Entity Resolution: Overview and Challenges
Hector Garcia-Molina
-
Hector Garcia-MolinaHector Garcia-Molina
01 Jan 2004
01 Jan 2004

Exploring Spark-SQL-Based Entity Resolution Using the Persistence Capability
Xiao Chen ... Sravani Mantha
-
Xiao Chen, et. al.Xiao Chen ... Sravani Mantha
01 Jan 2018
01 Jan 2018

Incremental entity resolution process over query results for data integration systems
Priscilla Kelly Machado Vieira ... Bernadette Farias Lóscio
Journal of Intelligent Information Systems | VOL. 52
Priscilla Kelly Machado Vieira, et. al.Priscilla Kelly Machado Vieira ... Bernadette Farias Lóscio
29 Jan 2019
Journal of Intelligent Information Systems | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards task-based parallelization for entity resolution

Abstract

Talk to us

Similar Papers

More From: SICS Software-Intensive Cyber-Physical Systems