Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Ijaz Hussain,Sohail Asghar

doi:10.1007/s13369-018-3099-0

Abstract

Author name ambiguity occurs when multiple authors share a common name and an author writes one’s name in many ways. This hinders the quality of information retrieval and correct attribution to authors in bibliographic databases. Despite much research in the past decade, the author name ambiguity problem remains largely unsolved. Outstanding issues include limited capabilities (solve only homonyms or synonyms), require extra information (Web or user feedback), actual number of authors K in advance and not scalable. In this paper, a method called GCLUSIM is proposed which uses graph structural clustering and proposed similarity measure to resolve ambiguous authors. GCLUSIM preprocesses citation data set and constructs co-authors graph. Graph-based structural clustering is applied to the constructed graph to identify hub nodes, outliers, and clusters of nodes. It resolves homonyms by splitting these clusters if the feature vector similarity between these clusters is less than the predefined threshold and synonyms by exploiting proposed similarity. Finally, it disambiguates sole authors by comparing name and feature vector similarities with the disambiguated clusters. Experiments are performed with Arnetminer and BDBComp to validate the performance of the GCLUSIM. Results show that GCLUSIM is scalable, overall better in performance than baselines and the number of clusters found is close to the ground truth clusters.

Full Text