Unsupervised DNF Blocking for Efficient Linking of Knowledge Graphs and Tables

Mayank Kejriwal

doi:10.3390/info12030134

Abstract

Entity Resolution (ER) is the problem of identifying co-referent entity pairs across datasets, including knowledge graphs (KGs). ER is an important prerequisite in many applied KG search and analytics pipelines, with a typical workflow comprising two steps. In the first ’blocking’ step, entities are mapped to blocks. Blocking is necessary for preempting comparing all possible pairs of entities, as (in the second ‘similarity’ step) only entities within blocks are paired and compared, allowing for significant computational savings with a minimal loss of performance. Unfortunately, learning a blocking scheme in an unsupervised fashion is a non-trivial problem, and it has not been properly explored for heterogeneous, semi-structured datasets, such as are prevalent in industrial and Web applications. This article presents an unsupervised algorithmic pipeline for learning Disjunctive Normal Form (DNF) blocking schemes on KGs, as well as structurally heterogeneous tables that may not share a common schema. We evaluate the approach on six real-world dataset pairs, and show that it is competitive with supervised and semi-supervised baselines.

Highlights

Entity Resolution (ER) is the identification of co-referent entities across datasets.Different communities refer to it as instance matching, record linkage, and the mergepurge problem [1,2]
Overall, when considering statistically significant results, the supervised method typically achieves better RR, but Pairs Completeness (PC) is high for all methods, with the proposed method performing the best on dataset pairs (DPs) 4 and the supervised baseline on DP 2, with high significance. We believe that the former result was obtained because the proposed method has the strongest approximation bounds out of all three systems, and that this effect would be most apparent on large DPs
We presented a generic pipeline for learning Disjunctive Normal Form (DNF) blocking schemes on heterogeneous dataset pairs

Summary

Introduction

Entity Resolution (ER) is the identification of co-referent entities across datasets.Different communities refer to it as instance matching, record linkage, and the mergepurge problem [1,2]. A blocking key, such as ‘Tokens(LastName)’, could first be applied to each node in the two KGs, as shown in the figure. In essence, this is a function that tokenizes the last name of each customer, and it assigns the customer to a block, indexed by the last-name token. If these graphs each contained thousands, or even millions of entities (which is not uncommon), the total number of pairwise comparisons would number in the trillions (106 × 106 ). An entity in one knowledge graph is only linked to a small number (typically, far less than five even) of entities in the other knowledge graph

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Mar 19, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Unsupervised DNF Blocking for Efficient Linking of Knowledge Graphs and Tables

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Online Updates of Knowledge Graph Embedding
Luo Fei ... Arijit Khan
-
Luo Fei, et. al.Luo Fei ... Arijit Khan
01 Jan 2021
01 Jan 2021

A study on temporal knowledge graph enrichment
Yu Liu
-
Yu LiuYu Liu
13 Aug 2021
13 Aug 2021

Hulu video recommendation
Xiaoran Xu ... Hanning Zhou
-
Xiaoran Xu, et. al.Xiaoran Xu ... Hanning Zhou
27 Sep 2018
27 Sep 2018

Using Knowledge Graphs to Explain Entity Co-occurrence in Twitter
Yiwei Wang ... Yuan-Fang Li
-
Yiwei Wang, et. al.Yiwei Wang ... Yuan-Fang Li
06 Nov 2017
06 Nov 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised DNF Blocking for Efficient Linking of Knowledge Graphs and Tables

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information