Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality

Alexander D Vanhelene,Ishaani Khatri,C Beau Hilton,Sanjay Mishra,Ece D Gamsiz Uzun,Jeremy L Warner

doi:10.1371/journal.pdig.0000456

Abstract

Meta-researchers commonly leverage tools that infer gender from first names, especially when studying gender disparities. However, tools vary in their accuracy, ease of use, and cost. The objective of this study was to compare the accuracy and cost of the commercial software Genderize and Gender API, and the open-source gender R package. Differences in binary gender prediction accuracy between the three services were evaluated. Gender prediction accuracy was tested on a multi-national dataset of 32,968 gender-labeled clinical trial authors. Additionally, two datasets from previous studies with 5779 and 6131 names, respectively, were re-evaluated with modern implementations of Genderize and Gender API. The gender inference accuracy of Genderize and Gender API were compared, both with and without supplying trialists’ country of origin in the API call. The accuracy of the gender R package was only evaluated without supplying countries of origin. The accuracy of Genderize, Gender API, and the gender R package were defined as the percentage of correct gender predictions. Accuracy differences between methods were evaluated using McNemar’s test. Genderize and Gender API demonstrated 96.6% and 96.1% accuracy, respectively, when countries of origin were not supplied in the API calls. Genderize and Gender API achieved the highest accuracy when predicting the gender of German authors with accuracies greater than 98%. Genderize and Gender API were least accurate with South Korean, Chinese, Singaporean, and Taiwanese authors, demonstrating below 82% accuracy. Genderize can provide similar accuracy to Gender API while being 4.85x less expensive. The gender R package achieved below 86% accuracy on the full dataset. In the replication studies, Genderize and gender API demonstrated better performance than in the original publications. Our results indicate that Genderize and Gender API achieve similar accuracy on a multinational dataset. The gender R package is uniformly less accurate than Genderize and Gender API.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality

Abstract

Talk to us

Similar Papers

More From: PLOS Digital Health

Lead the way for us

Journal: PLOS Digital Health	Publication Date: Oct 29, 2024
License type: CC BY 4.0

Similar Papers

Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality
Alexander D Vanhelene ... Jeremy L Warner
PLOS Digital Health | VOL. 3
Alexander D Vanhelene, et. al.Alexander D Vanhelene ... Jeremy L Warner
29 Oct 2024
PLOS Digital Health | VOL. 3

An integrative systematic review on interventions to improve layperson's ability to identify trustworthy digital health information.
Hind Mohamed ... Dervla Kelly
PLOS digital health | VOL. 3
Hind Mohamed, et. al.Hind Mohamed ... Dervla Kelly
25 Oct 2024
PLOS digital health | VOL. 3

Machine Learning For Risk Prediction After Heart Failure Emergency Department Visit or Hospital Admission Using Administrative Health Data.
Nowell M Fine ... Padma Kaul
PLOS digital health | VOL. 3
Nowell M Fine, et. al.Nowell M Fine ... Padma Kaul
25 Oct 2024
PLOS digital health | VOL. 3

Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier.
Elizabeth A Campbell ... Aaron J Masino
PLOS digital health | VOL. 3
Elizabeth A Campbell, et. al.Elizabeth A Campbell ... Aaron J Masino
23 Oct 2024
PLOS digital health | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality

Abstract

Talk to us

Similar Papers

More From: PLOS Digital Health