The accuracy of traditional topic models may be compromised due to the sparsity of co-occurring vocabulary in the corpus, whereas conventional word embedding models tend to excessively prioritize contextual semantic information and inadequately capture domain-specific features in the text. This paper proposes a hybrid semantic representation method that combines a topic model that integrates conceptual knowledge with a weighted word embedding model. Specifically, we construct a topic model incorporating the Probase concept knowledge base to perform topic clustering and obtain topic semantic representation. Additionally, we design a weighted word embedding model to enhance the contextual semantic information representation of the text. The feature-based information fusion model is employed to integrate the two textual representations and generate a hybrid semantic representation. The hybrid semantic representation model proposed in this study was evaluated based on various English composition test sets. The findings demonstrate that the model presented in this paper exhibits superior accuracy and practical value compared to existing text representation methods.
Read full abstract