Abstract

As the core of Internet text processing and text mining, text categorization has become a key research issue in the field of natural language processing. The traditional methods mainly focus on shallow machine learning. With the rapid development of deep learning technology, its image recognition and speech A huge research breakthrough in the field of identification, the feature learning ability of the depth model is further proved. This paper is based on the THUCNews data set, combined with word embedding technology, using TextCNN and RNN to achieve multi-classification of text. The experimental results show that the accuracy of TextCNN on the test set reaches 96.04%, and the precision, recall and f1-score of all types exceed 0.9, and the accuracy of RNN on the test set reaches 94.22%, and various types of Precision, recall and f1-score, in addition to the home category, are over 0.9. Comparing the two models, it can be seen that RNN is not very satisfactory except for the classification in the home. Other categories are not much different from CNN, and further adjustment parameters can be used to achieve better results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call