Abstract
Author name ambiguity may occur in two situations when multiple authors have the same name or the same author writes her name in multiple ways. The former is called homonym and the later is called synonym. Disambiguation of these ambiguous authors is a non-trivial job because there is a limited amount of information available in citations data set. In this paper, a graph structural clustering algorithm “LUCID: Author Name Disambiguation using Graph Structural Clustering” is proposed which disambiguates authors by using community detection algorithm and graph operations. In the first phase, LUCID performs some preprocessing tasks on data set and creates blocks of ambiguous authors. In the second phase coauthors graph is built and “SCAN: A Structural Clustering Algorithm for Networks” is applied to detect hubs, outliers, and clusters of nodes (author communities). The hub node that intersects with many clusters is considered as a homonym and resolved by splitting across this node. Finally, the synonyms are disambiguated using proposed hybrid similarity function. LUCID performance is evaluated using a real data set of Arnetminer. Results show that LUCID performance is overall better than baseline methods and it achieves 97% in terms of pairwise precision, 74% in pairwise recall and 82% in pairwise F1.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have