Unsupervised Author Disambiguation using Heterogeneous Graph Convolutional Network Embedding

Ziyue Qiao,Yi Du,Pengfei Wang,Yuanchun Zhou,Yanjie Fu

doi:10.1109/bigdata47090.2019.9005458

Abstract

People share same names in real world. When a digital library user searches for an author name, he may see a mixture of publications by different authors who have the same name. Making distinctions between them is an important prerequisite to improve the quality of services and contents in digital libraries. The general task of author disambiguation is to associate publications which belong to an identical name or names with highly similar spellings to different people entities. In recent years, many researches have been conducted to solve this challenging task. However, some works rely heavily on external knowledge bases and manually annotated data. Some unsupervised learning based works require complex feature engineering. In this paper, we propose a novel and efficient author disambiguation framework which needs no labeled data. We first construct a publication heterogeneous network for each ambiguous name. Then, we use our proposed heterogeneous graph convolutional network embedding method that encodes both graph structure and node attribute information to learn publication representations. After that, we propose a graph enhanced clustering method for name disambiguation that can greatly accelerate the clustering process and need not require the number of distinct persons. Our framework can be continually retrained and applied on incremental disambiguation task when new publications are put in. Experimental results on two datasets show that our framework clearly performs better than several state-of-the-art methods for author disambiguation.

Full Text