Word embedding and text classification based on deep learning methods

Saihan Li,Bing Gong

doi:10.1051/matecconf/202133606022

Abstract

Traditional manual text classification method has been unable to cope with the current huge amount of data volume. The improvement of deep learning technology also accelerates the technology of text classification. Based on this background, we presented different word embedding methods such as word2vec, doc2vec, tfidf and embedding layer. After word embedding, we demonstrated 8 deep learning models to classify the news text automatically and compare the accuracy of all the models, the model ‘2 layer GRU model with pretrained word2vec embeddings’ model got the highest accuracy. Automatic text classification can help people summary the text accurately and quickly from the mass of text information. No matter in the academic or in the industry area, it is a topic worth discussing.

Highlights

In recent years, with the rapid development of Internet technology and information technology, especially with the arrival of the era of big data, a huge amount of data is flooding every field of our life
Text classification is the process of assigning labels to text according to its content, it is one of the fundamental tasks in natural language processing (NLP)
NLP methods change the human language to numeral vectors for machine to calculate, with these word embeddings, researchers can do different tasks such as sentiment analysis, machine translation and natural language inference

Summary

Introduction

With the rapid development of Internet technology and information technology, especially with the arrival of the era of big data, a huge amount of data is flooding every field of our life. NLP methods change the human language to numeral vectors for machine to calculate, with these word embeddings, researchers can do different tasks such as sentiment analysis, machine translation and natural language inference. Each sample is one vector, the dimension of the vector can be determined by user Both word2vec and Doc2vec are unsupervised learning methods and Doc2vec was developed on the base of Word2vec. Doc2vec method uses the corpus to train the model and use this model to map every sample to a fixed dimension vector. The same corpus is used for word2vec method, after generating the word2vec model, every word will be mapped to a 100-dimension vector. In the embedding layer of a deep learning model, the words will be trained and transferred to the layer, the dimension of each word is set to be 100 here, the same as the word2vec method. The dimension of each sample after these 4 methods is very different as table 1 shows

Deep learning classification models

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: MATEC Web of Conferences	Publication Date: Jan 1, 2021
Citations: 16	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Word embedding and text classification based on deep learning methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences

Lead the way for us

Similar Papers

Semantic text classification: A survey of past and recent advances
Berna Altınel ... Murat Can Ganiz
Information Processing & Management | VOL. 54
Berna Altınel, et. al.Berna Altınel ... Murat Can Ganiz
20 Aug 2018
Information Processing & Management | VOL. 54

Digital Library Information Integration System Based on Big Data and Deep Learning
Xiao Lin ... Ying Zhang
Journal of Sensors | VOL. 2022
Xiao Lin, et. al.Xiao Lin ... Ying Zhang
01 Jul 2022
Journal of Sensors | VOL. 2022

A Novel Class-Center Vector Model for Text Classification Using Dependencies and a Semantic Dictionary
Xinhua Zhu ... Qingting Xu
IEEE Access | VOL. 8
Xinhua Zhu, et. al.Xinhua Zhu ... Qingting Xu
05 Dec 2019
IEEE Access | VOL. 8

Task-specific dependency-based word embedding methods
Chengwei Wei ... C.-C Jay Kuo
Pattern Recognition Letters | VOL. 159
Chengwei Wei, et. al.Chengwei Wei ... C.-C Jay Kuo
01 Jul 2022
Pattern Recognition Letters | VOL. 159

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Word embedding and text classification based on deep learning methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences