A fast and integrative algorithm for clustering performance evaluation in author name disambiguation

Jinseok Kim

doi:10.1007/s11192-019-03143-7

Abstract

Author name disambiguation results are often evaluated by measures such as Cluster-F, K-metric, Pairwise-F, Splitting & Lumping Error, and B-cubed. Although these measures have distinctive evaluation schemes, this paper shows that they can be calculated in a single framework by a set of common steps that compare truth and predicted clusters through two hash tables recording information about name instances with their predicted cluster indices and frequencies of those indices per truth cluster. This integrative calculation reduces greatly calculation runtime, which is scalable to a clustering task involving millions of name instances within a few seconds. During the integration process, B-cubed and K-metric are shown to produce the same precision and recall scores. In this framework, especially, name instance pairs for Pairwise-F are counted using a heuristic, surpassing a state-of-the-art algorithm in speedy calculation. Details of the integrative calculation are described with examples and pseudo-code to assist scholars to implement each measure easily and validate the correctness of implementation. The integrative calculation will help scholars compare similarities and differences of multiple measures before they select ones that characterize best the clustering performances of their disambiguation methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A fast and integrative algorithm for clustering performance evaluation in author name disambiguation

Abstract

Talk to us

Similar Papers

More From: Scientometrics

Lead the way for us

Journal: Scientometrics	Publication Date: Jun 14, 2019
Citations: 14

Similar Papers

CluEval: A Python tool for evaluating clustering performance in named entity disambiguation
Jinseok Kim ... Jenna Kim
Software Impacts | VOL. 16
Jinseok Kim, et. al.Jinseok Kim ... Jenna Kim
01 May 2023
Software Impacts | VOL. 16

Graph-based methods for Author Name Disambiguation: a survey.
Michele De Bonis ... Paolo Manghi
PeerJ. Computer science | VOL. 9
Michele De Bonis, et. al.Michele De Bonis ... Paolo Manghi
11 Sep 2023
PeerJ. Computer science | VOL. 9

Aggregating large-scale databases for PubMed author name disambiguation.
Li Zhang ... Jinqing Yang
Journal of the American Medical Informatics Association : JAMIA | VOL. 28
Li Zhang, et. al.Li Zhang ... Jinqing Yang
28 Jun 2021
Journal of the American Medical Informatics Association : JAMIA | VOL. 28

A span-based model for extracting overlapping PICO entities from randomized controlled trial publications.
Gongbo Zhang ... Yiliang Zhou
Journal of the American Medical Informatics Association : JAMIA | VOL. 31
Gongbo Zhang, et. al.Gongbo Zhang ... Yiliang Zhou
12 Mar 2024
Journal of the American Medical Informatics Association : JAMIA | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A fast and integrative algorithm for clustering performance evaluation in author name disambiguation

Abstract

Talk to us

Similar Papers

More From: Scientometrics