The linguistic characteristics of a literary work are the way of thinking embodied in the author’s use of language. From the textual value of Chinese language literature, this paper analyzes the spiritual connotation of Chinese language literature from two dimensions: reading and education. Based on the web crawler technology, we obtain the text data of Chinese language literature from three writers, Bajin, Yu Zheng and Qiong Yao, preprocess the data through data cleaning, Chinese word segmentation, de-duplication, etc., and extract the feature values of the text by using the TF-IDF algorithm. Then the text documents are mapped onto vectors using the VSM model, and the parameters of the LDA topic model are estimated by the Gibbs sampling algorithm in order to better obtain the topic changes of the Chinese language literature texts. This paper carries out linguistic feature verification from the lexical and similarity features of Chinese language literary texts. It is found that the difference in lexical density between Ba Jin’s Cold Night and Resting Garden is only 2.1 percentage points, and the frequency of the verb “to say” is 1,213 times and 735 times respectively. The average sentence lengths of Yu Zheng and Qiong Yao fluctuate within the range of [18.49,34.27], and Qiong Yao’s works have a higher thematic concentration than Zheng Zheng’s works. Analyzing the linguistic features of Chinese language literary texts based on text mining techniques helps to understand the authors’ language usage methods and helps to promote innovative expression paths in literary texts.
Read full abstract