Chinese Short Text Classification Based on Multi-level Semantic Feature Extraction

Fan Xu,Shiao Xu,Kuo-Chi Chang,Shuihua Sun,Zhiyuan Zhang

doi:10.1007/978-3-030-89701-7_21

Abstract

AbstractText classification, aiming to predict the category of a given text based on it's semantic information, is a fundamental part of natural language procession applications such as spam detection and user intention classification. Existing Chinese short text classification models mostly use neural network methods to extract text classification features, but still, there are problems of insufficient feature extraction and poor classification results. This paper constructs a Chinese short text classification model that incorporates multi-level semantic features. The model first uses Convolutional Neural Network(CNN) and Bidirectional Gated Recurrent Unit (BiGRU) to extract character and word features of texts; secondly, it builds a multi-level semantic extraction network to produce multi-level semantic representation by capturing local features and context features of texts, screening them, and fusing them with the character and word feature; finally, classify texts with a Softmax classifier. The experimental results on THUCNews are inclined to show that our model’s performance is further improved compared with existing models, and the classification accuracy reaches 93.59%.KeywordsChinese short text classificationConvolutional neural networkMulti-level semantic featuresMulti-LossBidirectional gated recurrent unit

Full Text