Abstract

This study presents a novel framework for integrating scientific categorization labels from diverse sources, thereby addressing the complexities introduced by varying classification systems in the scientific literature. Utilizing a SciBERT-based encoder and classification prediction models, this research proposes a method to effectively map and integrate labels across different databases. Our approach involves encoding the titles of scientific documents into vectorized embeddings, training a classification model on these embeddings, and then employing this model to categorize and harmonize labels from various sources. We apply this methodology to datasets from the 2017 National Natural Science Foundation of China (NSFC) and the Chinese Science Citation Database (CSCD), demonstrating the framework’s ability to improve label consistency and thematic coherence across multiple scientific disciplines. The results highlight a significant enhancement in the navigation and understanding of scientific literature, showcasing the potential of this approach to facilitate more efficient and integrated management of scientific knowledge.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.