Abstract

With the rapid development of deep learning, many deep learning models have been widely used in Natural Language Processing(NLP). The Long-Short Term memory network(LSTM) model and convolutional neural network(CNN) model can achieve high classification accuracy in text classification tasks. However, the high input dimension of text features and the need to train a large number of parameters in the deep learning model often take a lot of time. This paper uses Term Frequency-inverse Document Frequency(TF-IDF) to remove features with lower weights, extract key features in the text, extract the corresponding word vector through the Word2Vec model, and then input it into the CNN-LSTM model. We compared the model with CNN, LSTM, and LSTM-attention methods and found that the model can significantly reduce model parameters and training time in short and long text data sets. The model hardly loses accuracy in the long text, but the model will lose a certain amount of accuracy in short texts. This paper also proposes fusing original text features to make up for the accuracy loss caused by the TF-IDF feature extraction method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call