Person Name Disambiguation in the Web Using Adaptive Threshold Clustering

Agustín D Delgado,Víctor Fresno,Raquel Martínez,Soto Montalvo

doi:10.1002/asi.23810

Abstract

In this article, we present a new clustering algorithm for Person Name Disambiguation in web search results. The algorithm groups web results according to the individuals they refer to. The best state‐of‐the‐art approaches require training data in order to learn thresholds for deciding when to group the webpages. However, the ambiguity level of person names on the web could not be previously estimated and the results of those methods strongly depend on the thresholds obtained with the training collections. We present the concept of adaptive threshold, which avoids the need of a previous supervised learning process, depending only on the content of the compared documents to decide if they refer to the same person. We evaluated our approach using three datasets reaching close results to those obtained by the most successful algorithms in the state‐of‐the‐art that require such a learning process, and outperforming the results of those obtained by algorithms that do not require it.

Full Text