Relation-Aware Entity Matching Using Sentence-BERT

Mesfer Al Duhayyim,Mohammed Alamgeer,Manar Ahmed Hamza,Fahd N Al-Wesabi,Haya Mesfer Alshahrani,Anwer Mustafa Hilal

doi:10.32604/cmc.2022.020695

Abstract

A key aspect of Knowledge fusion is Entity Matching. The objective of this study was to investigate how to identify heterogeneous expressions of the same real-world entity. In recent years, some representative works have used deep learning methods for entity matching, and these methods have achieved good results. However, the common limitation of these methods is that they assume that different attribute columns of the same entity are independent, and inputting the model in the form of paired entity records will cause repeated calculations. In fact, there are often potential relations between different attribute columns of different entities. These relations can help us improve the effect of entity matching, and can perform feature extraction on a single entity record to avoid repeated calculations. To use attribute relations to assist entity matching, this paper proposes the Relation-aware Entity Matching method, which embeds attribute relations into the original entity description to form sentences, so that entity matching is transformed into a sentence-level similarity determination task, based on Sentence-BERT completes sentence similarity calculation. We have conducted experiments on structured, dirty, and textual data, and compared them with baselines in recent years. Experimental results show that the use of relational embedding is helpful for entity matching on structured and dirty data. Our method has good results on most data sets for entity matching and reduces repeated calculations.

Highlights

The goal of entity matching (EM) is to identify heterogeneous expressions of the same real-world entity
Tab. 2 shows a comparison between Relation-aware Entity Matching using SentenceBERT (REMS) and several baselines and the latest method Ditto without optimizations on structured data
Structured data is relatively clean data, in the form of an entity record including several entity attribute columns, REMS performs well on structured data, and the best F1-score is achieved on three structured datasets

Summary

Introduction

The goal of entity matching (EM) is to identify heterogeneous expressions of the same real-world entity. “Microsoft word 2007 version upgrade” and “Microsoft word 2007” both describe the product of Microsoft word 2007, and need to be identified as the same entity. Our goal is to find two entity records describing the same entity in the real world in two datasets. The task of EM is to find two product records that may match the two datasets

Objectives

Methods

Results

Conclusion