Abstract

Name variants are ubiquitous in real world due typographical errors (e.g., Forschungszentrum Julich vs. Forschungszentrum Julich), abbreviated, imcomplete, or missing information (e.g., R. E. Ellis vs. Randy E. Ellis), lack of standard name formatting convention (e.g., Spike Jonze vs. Jones, Spike), and their combinations. In this paper, we project this name disambiguation problem to graph representation, and then analyze graphs using social network analysis. In particular, we used real duplicate name entities that we manually verifed from ACM digital library. Then, using various string similarity metrics and additional information (i.e., co-author names, titles, and venues), we analyze the effectiveness of string similarity metrics and additional information based on social network analysis. Through our experimental validation, name disambiguation problem can be analyzed in graphical, visual manner.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call