Abstract Chinese literary texts contain a sizeable vivid imagery vocabulary, which makes it difficult for average readers to judge the boundaries between words, and the current pre-trained language model is also difficult for them to learn its implicit knowledge effectively, which brings troubles to machine semantic analysis. The study uses CRF training to obtain a semantic analysis model of Chinese literary texts that recognizes the semantic relationship between two words. SVM is used to train classifiers for confusing categories, and the two semantic relations in the output of the CRF model are further recognized to determine the final semantic relations between word pairs. Finally, the LCQMC dataset is used as the experimental data, and the semantic analysis technique based on CRF and SVM is employed to obtain the participle, lexical, and dependent syntactic annotations. According to the results, the model’s correct rates on the LAS for paraphrase recognition and dependency analysis of Chinese literary texts are 74.83% and 92.05%, respectively. The study enhances the efficiency of semantic analysis of relevant Chinese texts and is crucial for the study on the semantic analysis of terms.
Read full abstract