Abstract

Name disambiguation has long been a significant issue in many fields, such as literature management and social analysis. In recent years, methods based on graph networks have performed well in name disambiguation, but these works have rarely used heterogeneous graphs to capture relationships between nodes. Heterogeneous graphs can extract more comprehensive relationship information so that more accurate node embedding can be learned. Therefore, a Dual-Channel Heterogeneous Graph Network is proposed to solve the name disambiguation problem. We use the heterogeneous graph network to capture various node information to ensure that our method can learn more accurate data structure information. In addition, we use fastText to extract the semantic information of the data. Then, a clustering method based on DBSCAN is used to classify academic papers by different authors into different clusters. In many experiments based on real datasets, our method achieved high accuracy, which proves its effectiveness.

Highlights

  • Information 2021, 12, 383. https://Every day, new papers are added to the academic paper database

  • When a user searches for this name, the search results will contain many authors or articles that are unrelated to the desired search result

  • In the DBLP digital library, when users search the database for the name “Li Jie”, there will be a large number of authors whose names are written as “Li Jie” in English, but the Chinese names of these authors are different; they are not the same person

Read more

Summary

Introduction

New papers are added to the academic paper database. In 2020, the total number of publications in the DBLP database was close to 5.5 million, and the growth rates in the past three years were 10.46%, 10.44%, and 10.37%, respectively. In the DBLP digital library, when users search the database for the name “Li Jie”, there will be a large number of authors whose names are written as “Li Jie” in English, but the Chinese names of these authors are different; they are not the same person. In such cases, the accuracy of paper search results will decrease. Few methods use heterogeneous graphs [4] to extract the structural information between data and combine it with the semantic information of the text to learn more accurate node representation. Extensive experiments on real-world datasets prove the effectiveness of our method

Heterogeneous Graph Network
Author Name Disambiguation
Problem Formulation
DHGN: The Proposed Method for Author Name Disambiguation
Use FastText to Construct Semantic Representation Vector
Use Heterogeneous Graph to Construct Relational Representation Vector
Use DBSCAN for Node Clustering
Datasets
Baselines
Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call