Social Network Analysis on Name Disambiguation and More

Byung-Won On

doi:10.1109/iccit.2008.210

Abstract

Name variants are ubiquitous in real world due typographical errors (e.g., Forschungszentrum Julich vs. Forschungszentrum Julich), abbreviated, imcomplete, or missing information (e.g., R. E. Ellis vs. Randy E. Ellis), lack of standard name formatting convention (e.g., Spike Jonze vs. Jones, Spike), and their combinations. In this paper, we project this name disambiguation problem to graph representation, and then analyze graphs using social network analysis. In particular, we used real duplicate name entities that we manually verifed from ACM digital library. Then, using various string similarity metrics and additional information (i.e., co-author names, titles, and venues), we analyze the effectiveness of string similarity metrics and additional information based on social network analysis. Through our experimental validation, name disambiguation problem can be analyzed in graphical, visual manner.

Full Text