Abstract

Machine learning for author name disambiguation is usually conducted on the training and test subsets of labeled data created for a specific task. As a result, disambiguation models learned on heterogeneous labeled data are often inapplicable for other purposes that either do not use the same labeled data or do not make use of any labeled data at all. This article explores the idea of transfer learning in a new context, author name disambiguation. We focus on cases where a disambiguation task lacking labeled training data uses models trained on labeled data generated for other tasks. For this purpose, two labeled source datasets are used for training of disambiguation models to be applied to three test target datasets that are deficient of labeled training data. Our results show that transfer learning can produce disambiguation performances similar to those achievable by traditional machine learning in which training and test datasets come from the same labeled data source. The good performance through transfer learning are possible when training source datasets have similar feature distributions as test target datasets. This study suggests that through transfer learning, rich disambiguation models in previous studies can be retained and reused across ambiguous bibliographic data from different fields and data sources, motivating further research on how to correct feature distribution differences between source and target datasets to expand the application of transfer learning in author name disambiguation beyond the model sharing explored in this research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.