Name2Vec: Name Matching using Character-based with Deep Learning

Xuan Truong Dinh

doi:10.1016/j.procs.2023.12.086

Abstract

Name matching plays a crucial role in big data and various integration applications, being indispensable when consolidating information from diverse sources. This encompasses tasks such as deduplication, data linkage systems, search engines, text and web mining, information extraction, and more. Discrepancies and anomalies in names, including syntax variations like abbreviations, typographical errors, occasional whitespace omissions, word insertions, deletions, and even multiple spellings for the same name, can lead to missed matches. In previous methodologies, a predefined penalty scheme was often employed for each differing character or multi-character token between two strings. This research introduces Name2Vec, an algorithm that addresses name matching using a neural network model to capture name semantics. This approach advances by suggesting a suitable feature set through the fusion of Name2Vec and character-based name representations. The empirical findings of this research confirm that this performance enhancement improves matching efficiency while simultaneously reducing misclassifications compared to state-of-the-art methods.

Full Text