A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory.

Yingying Ma,Youlong Wu,Chengqiang Lu

doi:10.3390/e22040416

Abstract

Name ambiguity, due to the fact that many people share an identical name, often deteriorates the performance of information integration, document retrieval and web search. In academic data analysis, author name ambiguity usually decreases the analysis performance. To solve this problem, an author name disambiguation task is designed to divide documents related to an author name reference into several parts and each part is associated with a real-life person. Existing methods usually use either attributes of documents or relationships between documents and co-authors. However, methods of feature extraction using attributes cause inflexibility of models while solutions based on relationship graph network ignore the information contained in the features. In this paper, we propose a novel name disambiguation model based on representation learning which incorporates attributes and relationships. Experiments on a public real dataset demonstrate the effectiveness of our model and experimental results demonstrate that our solution is superior to several state-of-the-art graph-based methods. We also increase the interpretability of our method through information theory and show that the analysis could be helpful for model selection and training progress.

Highlights

Entity Linking tasks recognize or disambiguate named entities to an entity in a knowledge base
The author name disambiguation task aims to partition publications written by different people who share the same name such that each partition only contains documents associated with one real-life person
Experimental results indicate that our solution achieves significantly better performance than several state-of-the-art graph-based methods including Zhang and Yao [4], Zhang et al [5] and GHOST [6]

Summary

Introduction

Entity Linking tasks recognize or disambiguate named entities to an entity in a knowledge base. It is a significant problem in natural language processing and has been extensively studied. One important task of entity linking is author name disambiguation. The author name disambiguation task aims to partition publications written by different people who share the same name such that each partition only contains documents associated with one real-life person. In the field of bibliographic data analysis and document retrieval, author name disambiguation is crucial. When someone is looking for the publications of a scholar name “Charles” in a database, the query may return many papers from different “Charles”, which could cause ambiguity and deteriorate the performance of this search. If an organization want to calculate the impact of many authors, they need to know their publications exactly

Objectives

Methods

Results

Conclusion