Ethnicity-based name partitioning for author name disambiguation using supervised machine learning.

Jinseok Kim,Jenna Kim,Jason Owen-Smith

doi:10.1002/asi.24459

Abstract

In several author name disambiguation studies, some ethnic name groups such as East Asian names are reported to be more difficult to disambiguate than others. This implies that disambiguation approaches might be improved if ethnic name groups are distinguished before disambiguation. We explore the potential of ethnic name partitioning by comparing performance of four machine learning algorithms trained and tested on the entire data or specifically on individual name groups. Results show that ethnicity‐based name partitioning can substantially improve disambiguation performance because the individual models are better suited for their respective name group. The improvements occur across all ethnic name groups with different magnitudes. Performance gains in predicting matched name pairs outweigh losses in predicting nonmatched pairs. Feature (e.g., coauthor name) similarities of name pairs vary across ethnic name groups. Such differences may enable the development of ethnicity‐specific feature weights to improve prediction for specific ethic name categories. These findings are observed for three labeled data with a natural distribution of problem sizes as well as one in which all ethnic name groups are controlled for the same sizes of ambiguous names. This study is expected to motive scholars to group author names based on ethnicity prior to disambiguation.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of the American Society for Information Science and Technology	Publication Date: Feb 23, 2021
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Ethnicity-based name partitioning for author name disambiguation using supervised machine learning.

Abstract

Talk to us

Similar Papers

More From: Journal of the American Society for Information Science and Technology

Lead the way for us

Similar Papers

The impact of imbalanced training data on machine learning for author name disambiguation
Jinseok Kim ... Jenna Kim
Scientometrics | VOL. 117
Jinseok Kim, et. al.Jinseok Kim ... Jenna Kim
27 Jul 2018
Scientometrics | VOL. 117

Generating automatically labeled data for author name disambiguation: an iterative clustering method
Jinseok Kim ... Jason Owen-Smith
Scientometrics | VOL. 118
Jinseok Kim, et. al.Jinseok Kim ... Jason Owen-Smith
29 Nov 2018
Scientometrics | VOL. 118

A review of author name disambiguation techniques for the PubMed bibliographic database
Debarshi Kumar Sanyal ... Plaban Kumar Bhowmick
Journal of Information Science | VOL. 47
Debarshi Kumar Sanyal, et. al.Debarshi Kumar Sanyal ... Plaban Kumar Bhowmick
01 Dec 2019
Journal of Information Science | VOL. 47

Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation
Jinseok Kim ... Jinmo Kim
Journal of Information Science | VOL. 49
Jinseok Kim, et. al.Jinseok Kim ... Jinmo Kim
31 May 2021
Journal of Information Science | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ethnicity-based name partitioning for author name disambiguation using supervised machine learning.

Abstract

Talk to us

Similar Papers

More From: Journal of the American Society for Information Science and Technology