Automatic author name disambiguation by differentiable feature selection

Zhijian Fang,Zhechong Tang,Jinying Xu,Zijie Jia,Yue Zhuo,Huaxiong Zhang

doi:10.1177/01655515231193859

Zhijian Fang, Zhechong Tang + Show 4 more

https://doi.org/10.1177/01655515231193859

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Author name disambiguation (AND) is the task of resolving the ambiguity problem in bibliographic databases, where distinct real-world authors may share the same name or same author may have distinct names. The aim of AND is to split the name-ambiguous entities (articles) into the corresponding authors. Existing AND algorithms mainly focus on designing different similarity metrics between two ambiguous articles. However, most previous methods empirically select and process the features of entities, then use features to predict the similarity by data-driven models. In this article, we are motivated by natural questions: Which features are most useful for splitting name-ambiguous entities? Can they be automatically determined by an optimisation approach rather than heuristic feature engineering? Therefore, we proposed a novel end-to-end differentiable feature selection algorithm, automatically searching the optimal features for AND task (AAND). AAND optimises the discrete feature selection by differentiable Gumbel-Softmax, leading to the joint learning of feature selection policy and similarity prediction model. The experiments are conducted on a benchmark data set, S2AND, which harmonises eight different AND data sets. The results show that the performance of our proposal is superior to the advanced AND methods and feature selection algorithms. Meanwhile, deep insights into AND features are also given.

Full Text