A supervised and distributed framework for cold-start author disambiguation in large-scale publications

Yibo Chen,Liping Gao,Jianliang Gao,Zhiyi Jiang,Zhao Li,Hongliang Du

doi:10.1007/s00521-020-05684-y

Abstract

Names make up a large portion of queries in search engines, while the name ambiguity problem brings negative effect to the service quality of search engines. In digital academic systems, this problem refers to a large number of publications containing ambiguous author names. Name ambiguity derives from many people sharing identical names, or names may be abbreviated. Although some methods have been proposed in the decade, this problem is still not completely solved and there are many subproblems needing to be studied. Due to lack of information, it is a nontrivial task to distinguish ambiguous authors accurately relying on limited internal information only. In this paper, we focus on the cold-start disambiguation task with homonymous author names, i.e., distinguishing publications written by authors with identical names. We present a supervised framework named DND (abbreviation for Distributed Framework for Name Disambiguation) to solve the author disambiguation problem efficiently. DND utilizes accessible information and trains a robust function to measure similarities between publications, and then determines whether they belong to the same author. In traditional clustering-based approaches for author disambiguation, the number of clusters which is the amount of authors sharing the same name is hard to predict in advance, while DND transforms the clustering task to a linkage prediction task to avoid specifying the number of clusters. We validate the effectiveness of DND on two real-world datasets. The experimental results indicate that DND achieves a competitive performance compared with the baselines.

Full Text