Abstract

Author identifier (ID) is essential for many downstream tasks, such as co-author network and scientist mobility analysis. As a widely used database, author ID of PubMed is not officially provided by National Institutes of Health (NIH), that restrict some identifier-based researches or systems. This study exploited three open bibliographic databases Aminer, Microsoft Academic Graph (MAG) and Semantic Scholar (S2) to associate author ID for PubMed. For this purpose, paper linking and author linking was performed in order to mine paper and author links between PubMed and these databases. Performance of author name disambiguation (AND) was evaluated on two datasets. Our findings suggested that, S2 contains full volume of PubMed regarding link completeness. With respect to correctness of author ID, S2 and MAG achieved better performance than Aminer. The best F1 score of there available identifiers is below 90%, indicate AND for large scale database remain as a difficult task and efforts are being need for further improvement. We made the final dataset publicly available for facilitating future research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call