Abstract
Semantic representation methods play a crucial role in text mining tasks. Although numerous approaches have been proposed and compared in text mining research, the comparison of semantic representation methods specifically for publication keywords in bibliometric studies has received limited attention. This lack of practical evidence makes it challenging for researchers to select suitable methods to obtain keyword vectors for downstream bibliometric tasks, potentially hindering the achievement of optimal results. To address this gap, this study conducts an experimental comparison of various typical semantic representation methods for keywords, aiming to provide quantitative evidence for bibliometric studies. The experiment focuses on keyword clustering as the fundamental task and evaluates 22 variations of five typical methods across four scientific domains. The methods compared are co-word matrix, co-word network, word embedding, network embedding, and “semantic + structure” integration. The comparison is based on fitting the clustering results of these methods with the “evaluation standard” specific to each domain. The empirical findings demonstrate that the co-word matrix exhibits subpar performance, whereas the co-word network and word embedding techniques display satisfactory performance. Among the five network embedding algorithms, LINE and Node2Vec outperform DeepWalk, Struc2Vec, and SDNE. Remarkably, both the “pre-training and fine-tuning” model and the “semantic + structure” model yield unsatisfactory results in terms of performance. Nevertheless, even with variations in the performance of these methods, no singular approach stands out as universally superior. When selecting methods in practical applications, comprehensive consideration of factors such as corpus size and semantic cohesion of domain keywords is crucial. This study advances our understanding of semantic representation methods for keyword analysis and contributes to the advancement of bibliometric analysis by providing valuable recommendations for researchers in selecting appropriate methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.