ModER: Graph-based Unsupervised Entity Resolution using Composite Modularity Optimization and Locality Sensitive Hashing

Islam Akef Ebeid,John R Talburt,Md Abdus Salam Siddique,Nicholas Kofi Akortia Hagan

doi:10.14569/ijacsa.2022.0130901

Islam Akef Ebeid, John R Talburt + Show 2 more

Open Access

https://doi.org/10.14569/ijacsa.2022.0130901

Copy DOI

Abstract

Entity resolution describes techniques used to identify documents or records that might not be duplicated; nevertheless, they might refer to the same entity. Here we study the problem of unsupervised entity resolution. Current methods rely on human input by setting multiple thresholds prior to execution. Some methods also rely on computationally expensive similarity metrics and might not be practical for big data. Hence, we focus on providing a solution, namely ModER, capable of quickly identifying entity profiles in ambiguous datasets using a graph-based approach that does not require setting a matching threshold. Our framework exploits the transitivity property of approximate string matching across multiple documents or records. We build on our previous work in graph-based unsupervised entity resolution, namely the Data Washing Machine (DWM) and the Graph-based Data Washing Machine (GDWM). We provide an extensive evaluation of a synthetic data set. We also benchmark our proposed framework using state-of-the-art methods in unsupervised entity resolution. Furthermore, we discuss the implications of the results and how it contributes to the literature.

Full Text