Analysis and Comparison of Chinese News Text Classification Methods Based on Deep Learning

Jian Chen,Wenxiao Jiang,Zekai Feng

doi:10.54097/hset.v16i.2496

Abstract

As people in today's world consume an increasing amount of information, the number of Internet News is also vastly increasing. Facing all sorts of different kinds of news, how to accurately distinguished different types of news becomes the direction of many scholars' study.This article uses word cloud to represent keywords used in different domains of news. Moreover, we used two methods: TF-IDF and TextRank, to identify and analyze keywords of different fields of news. To understand the performance using various classification methods, we choose the THUCNews data sets. This data set collects ten fields of news in the history of Weibo. Moreover, we choose nine different kinds of machine learning algorithms, including SVM, XGBoost, RandomForest, GBDT, GRU, LSTM, CNN, RNN, and MLP, to investigate their performance. Among these nine models, GRU has an accuracy of 96.93%, SVM has an accuracy of 96.39%, CNN has an accuracy of 94.72%, and RandomForest has an accuracy of 92.97%, which make them stand out in their similar models. We used word-embedding vectorization for the Neural Network algorithm and TF-IDF vectorization for the others.

Full Text