Abstract

We examined the usefulness (precision) and completeness (recall) of the Author-ity author disambiguation for PubMed articles by associating articles with scientists funded by the National Institutes of Health (NIH). In doing so, we exploited established unique identifiers—Principal Investigator (PI) IDs—that the NIH assigns to funded scientists. Analyzing a set of 36,987 NIH scientists who received their first R01 grant between 1985 and 2009, we identified 355,921 articles appearing in PubMed that would allow us to evaluate the precision and recall of the Author-ity disambiguation. We found that Author-ity identified the NIH scientists with 99.51% precision across the articles. It had a corresponding recall of 99.64%. Precision and recall, moreover, appeared stable across common and uncommon last names, across ethnic backgrounds, and across levels of scientist productivity.

Highlights

  • The PubMed database contains the most comprehensive listing of articles in the life sciences

  • One ideally would want to assess the accuracy of the Author-ity IDs against another set of author identifications known to have few if any errors. We developed such an assessment by using the Principal Investigator IDs (PI IDs) assigned by the National Institutes of Health (NIH) to the scientists that it funds through grants

  • Further decomposing the mis-matches, we found that mis-integrated matches never involved more than three PI IDs being associated with one Author-ity ID

Read more

Summary

Introduction

The PubMed database contains the most comprehensive listing of articles in the life sciences. At the time of our writing, PubMed contained more than 25 million articles; because each of these articles, on average, has more than one author, it includes more than 70 million authorships [1]. If one could trace individuals over time, it would allow researchers to explore a variety of questions relevant to the science of science policy: Do life scientists benefit from crossinstitutional or international collaboration? Do men and women differ in their publication trajectories?. The difficulty in answering these questions comes in trying to determine whether authorships on two or more different articles represent the same individual or different people. Many different people may have the same name, and the names and affiliations of an individual sometimes change over time. Articles may list only authors’ initials instead of their full first names

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call