FTRLIM: Distributed Instance Matching Framework for Large-Scale Knowledge Graph Fusion.

Hongming Zhu,Bowen Du,Yizhi Jiang,Xiaowen Wang,Hongfei Fan,Qin Liu

doi:10.3390/e23050602

Hongming Zhu, Bowen Du + Show 4 more

Open Access

https://doi.org/10.3390/e23050602

Copy DOI

Abstract

Instance matching is a key task in knowledge graph fusion, and it is critical to improving the efficiency of instance matching, given the increasing scale of knowledge graphs. Blocking algorithms selecting candidate instance pairs for comparison is one of the effective methods to achieve the goal. In this paper, we propose a novel blocking algorithm named MultiObJ, which constructs indexes for instances based on the Ordered Joint of Multiple Objects’ features to limit the number of candidate instance pairs. Based on MultiObJ, we further propose a distributed framework named Follow-the-Regular-Leader Instance Matching (FTRLIM), which matches instances between large-scale knowledge graphs with approximately linear time complexity. FTRLIM has participated in OAEI 2019 and achieved the best matching quality with significantly efficiency. In this research, we construct three data collections based on a real-world large-scale knowledge graph. Experiment results on the constructed data collections and two real-world datasets indicate that MultiObJ and FTRLIM outperform other state-of-the-art methods.

Highlights

Due to the lack of unified presentation standards for data and information, and/or the differences in the methods of obtaining data [7], the relevant knowledge of the same entity in the real world is represented in various forms among different knowledge graphs
Since Follow-the-Regular-Leader Instance Matching (FTRLIM) participated in the SPIMBENCH Track at OAEI 2019, the evaluation results are reported
Whether the MultiObJ blocking algorithm enables the instance matching for largescale knowledge graphs by reducing the number of candidate pairs with only a slight impact on the matching quality?

Summary

Introduction

Due to the lack of unified presentation standards for data and information, and/or the differences in the methods of obtaining data [7], the relevant knowledge of the same entity in the real world is represented in various forms among different knowledge graphs. Instance matching (IM) is defined as establishing a specific type of semantic link between instances. It allows us to explicitly link two instances that refer to the same entity in the real world. When merging different knowledge graphs, instance matching is adopted to achieve consistency and integrity. As the scale of the built knowledge graphs increases, the efficiency and cost requirements of instance matching methods become more strict. Matching instances between knowledge graphs corresponds to the Clique problem in graph theory, which is an NP-complete problem [13,14]. Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations

Results

Discussion

Conclusion