Abstract Topic analysis aims to study topic evolution and trends in order to help researchers understand the process of knowledge evolution and creation. This paper develops a novel topic evolution analysis framework, which we use to demonstrate, forecast, and explain topic evolution from the perspective of the geometrical motion of topic embeddings generated by pretrained language models. Our dataset comprises approximately 15 million papers in the computer science field, with 7,000 “fields of study” to represent the topics. First, we demonstrated that over 80% of topics had undergone obvious motion in the semantic vector space, based on the hyperplane and its normal vector generated by a support vector machine. Subsequently, we verified the predictability of the motion based on three vector regression models by predicting topic embeddings. Finally, we employed a decoder to explain the predicted motion, whose forecast embeddings can capture about 50% of unseen topics. Our research framework shows that topic evolution can be analyzed via the geometrical motion of topic embeddings, and the semantic motion of old topics nurtures new topics. The current study opens new research pathways in topic analysis and sheds light on the topic evolution mechanism from a novel geometric perspective. Peer Review https://www.webofscience.com/api/gateway/wos/peer-review/10.1162/qss_a_00344
Read full abstract