Research on Text Classification Techniques Based on Improved TF-IDF Algorithm and LSTM Inputs

Minhui Liang,Tiansen Niu

doi:10.1016/j.procs.2022.10.064

Abstract

Text classification is a technique that automatically classifies and labels text according to certain rules, and is widely used in sentiment analysis, intelligent recommendation systems and intelligent question and answer systems. Deep learning-based text classification methods can automatically identify and extract features in text that are useful for classification, so that it can analyse the text content directly, saving a lot of labour costs required for manual feature extraction. In this paper, the TF-IDF algorithm and the input structure of bidirectional LSTM was modified. Specifically, we rewrote the TF-IDF formula and processed the texts with a sliding window. The word2vec vector was used as the word embedding layer, and the text features were extracted using a combination of bidirectional LSTM and Text-CNN for classification prediction. Bidirectional LSTM treats texts as sequences to grasp information as a whole, while Text-CNN can extract local important features at the sentence level. As a result, good accuracy was achieved.

Full Text