Abstract

It is important to disambiguate names among persons in many scenarios. In this work, we propose an unsupervised method Diting and a semi-supervised method Diting++ for author disambiguation. In Diting, we learn a low-dimensional vector to represent each paper in networks, which are formed by connecting papers with multiple types of relationship (such as co-author). During representation learning, we focus on maximizing the gap between positive edges and negative edges. Further, we propose a clustering algorithm which associates papers to their real-life authors. To make full use of the authorship information, which is easy to obtain from the authors' homepages, we design Diting++ to improve the performance for name disambiguation. Diting++ uses the authorship information listed on the authors' homepages to construct label networks and uses a network representation learning method to learn paper representations based on label networks and other networks. Further, Diting++ uses a semi-supervised clustering method to partition learned paper representations into disjoint groups. Each group belongs to a distinct author. By making use of the label information, the clustering method partitions papers written by the same author in the same group, whereas papers written by different authors locate in different groups. Through extensive experiments, we show that our methods are significantly better than the state-of-the-art author disambiguation methods.

Highlights

  • We focus on author disambiguation that associates documents to different persons who share an identical name

  • We propose a novel network representation learning method for author disambiguation, which models multiple types of paper relationships to paper representations

  • We find that our unsupervised method Diting can obtain at least 5.7% better Marco-F1 result than the other author disambiguation methods, and our semi-supervised method Diting++ can obtain at least 10.9% better Marco-F1 result

Read more

Summary

Introduction

When we search for documents about one particular author in the field of literature search, we may get many results (e.g., papers, web pages) containing the author’s name. Even those documents share the same name we search for, they can be different peoples. A search query for the name ‘‘Mark Newman’’ could obtain a physicist who works in the University of Michigan, a computer scientist who works in the same university, and so on. Apart from these, the ambiguous name problem appears in many other fields, such as law enforcement and bibliometrics science.

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call