News Short Text Classification Based on Bert Model and Fusion Model

Hongyang Cui,Yibo Yu,Chentao Wang

doi:10.54097/hset.v34i.5482

Abstract

Text classification task is one of the most fundamental tasks in NLP, and the classification of short news text could be the basis for many other tasks. In this paper, we applied a fusion model combining Bert and TextRNN with some modified details to expect higher accuracy of text classification. We used the THUCNews as dataset which consists of two columns one for news text and the other for numbers. The original dataset was seperated into three parts: training set, validation set and test set. Besides, we used BERT model which contains two pre-training tasks and TextRNN model which refers to the use of RNN to solve text classification problems. We trained these two models in parallel, and then the optimal Bert and TextRNN models obtained through training and parameter tuning are added with a fully-connected layer to receive the final results by weighting the efficiency of Bert and TextRNN. The fusion model solves the problem of over-fitting and under-fitting of a single model, and helps to obtain a model with better generalization performance. The experimental results show the sharp change in loss and accuracy as well as the final accuracy of the BERT model. The precision, recall-rate and F1-score are also evaluated in this paper. The accuracy of fusion model of BERT and TextRNN is much better than single Bert model and has a gap to 1.76%.

Full Text