Research of Text Classification Based on TF-IDF and CNN-LSTM

Hai Zhou

doi:10.1088/1742-6596/2171/1/012021

Hai Zhou

Open Access

https://doi.org/10.1088/1742-6596/2171/1/012021

Copy DOI

Export

Save

Cite

Journal: Journal of Physics: Conference Series	Publication Date: Jan 1, 2022
Citations: 21	License type: cc-by

Abstract
Full-Text
Similar Papers

Abstract

Listen

With the rapid development of deep learning, many deep learning models have been widely used in Natural Language Processing(NLP). The Long-Short Term memory network(LSTM) model and convolutional neural network(CNN) model can achieve high classification accuracy in text classification tasks. However, the high input dimension of text features and the need to train a large number of parameters in the deep learning model often take a lot of time. This paper uses Term Frequency-inverse Document Frequency(TF-IDF) to remove features with lower weights, extract key features in the text, extract the corresponding word vector through the Word2Vec model, and then input it into the CNN-LSTM model. We compared the model with CNN, LSTM, and LSTM-attention methods and found that the model can significantly reduce model parameters and training time in short and long text data sets. The model hardly loses accuracy in the long text, but the model will lose a certain amount of accuracy in short texts. This paper also proposes fusing original text features to make up for the accuracy loss caused by the TF-IDF feature extraction method.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Research of Text Classification Based on TF-IDF and CNN-LSTM

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

Sentiment Analysis using a CNN-BiLSTM Deep Model Based on Attention Classification
Wang Yue ... Li Lei
Information | VOL. 26
Wang Yue, et. al.Wang Yue ... Li Lei
15 Sep 2023
Information | VOL. 26

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

An Unsupervised Learning Short Text Clustering Method
Zuhua Dai ... Hongyi Li
Journal of Physics: Conference Series | VOL. 1650
Zuhua Dai, et. al.Zuhua Dai ... Hongyi Li
01 Oct 2020
Journal of Physics: Conference Series | VOL. 1650

Probabilistic Relational Supervised Topic Modelling using Word Embeddings
Jabir Alshehabi Al-Ani ... Maria Fasli
-
Jabir Alshehabi Al-Ani, et. al.Jabir Alshehabi Al-Ani ... Maria Fasli
01 Dec 2018
01 Dec 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Research of Text Classification Based on TF-IDF and CNN-LSTM

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series