Chinese Text Classification Model Based on Deep Learning

Yue Li,Pengjian Xu,Xutao Wang

doi:10.3390/fi10110113

Yue Li, Pengjian Xu + Show 1 more

Open Access

PDF Available

https://doi.org/10.3390/fi10110113

Copy DOI

Export

Save

Cite

Journal: Future Internet	Publication Date: Nov 20, 2018
Citations: 66	License type: CC BY 4.0

Affiliation: Donghua University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Text classification is of importance in natural language processing, as the massive text information containing huge amounts of value needs to be classified into different categories for further use. In order to better classify text, our paper tries to build a deep learning model which achieves better classification results in Chinese text than those of other researchers’ models. After comparing different methods, long short-term memory (LSTM) and convolutional neural network (CNN) methods were selected as deep learning methods to classify Chinese text. LSTM is a special kind of recurrent neural network (RNN), which is capable of processing serialized information through its recurrent structure. By contrast, CNN has shown its ability to extract features from visual imagery. Therefore, two layers of LSTM and one layer of CNN were integrated to our new model: the BLSTM-C model (BLSTM stands for bi-directional long short-term memory while C stands for CNN.) LSTM was responsible for obtaining a sequence output based on past and future contexts, which was then input to the convolutional layer for extracting features. In our experiments, the proposed BLSTM-C model was evaluated in several ways. In the results, the model exhibited remarkable performance in text classification, especially in Chinese texts.

Highlights

Motivated by the development of Internet technology and the progress of mobile social networking platforms, the amount of textual information is growing rapidly on the Internet
To validate the ability of our model on different languages, our bidirectional long short-term memory (BLSTM)-C model is compared with a simple long short-term memory (LSTM) model on the English news dataset as well as the Chinese news dataset that has the same categories as the English one
This paper mainly introduces a combined model called BLSTM-C that is made up of a bi-directional LSTM layer and a convolutional layer

Summary

Introduction

Motivated by the development of Internet technology and the progress of mobile social networking platforms, the amount of textual information is growing rapidly on the Internet. Machine-learning-based methods, including naive Bayes, support vector machine, and k-nearest neighbors, are generally adopted by traditional text classification. Their performance depends mainly on the quality of hand-crafted features. In terms of natural language processing, CNN is able to extract n-gram features from different positions of a sentence through convolutional filters and it learns both short- and long-range relations through the operations of pooling. The BLSTM is employed firstly to capture the long-term sentence dependencies and CNN is adopted to extract features for sequence modeling tasks. It turns out that our model is more suitable for the Chinese language It is shown through our evaluation that our BLSTM-C model achieves remarkable results and it outperforms a wide range of baseline models

Related Work

Input Layer

Remove The Stop Words

Segmentation

BLSTM Layer

Convolutional Neural Networks Layer

Proposed BLSTM-C Model

Datasets

Word Vector Initialization and Padding

Hyper-Parameter Setting

Overall Performance

Comparison between English and Chinese

Performance Analysis

Conclusions

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Chinese Text Classification Model Based on Deep Learning

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Future Internet

Lead the way for us

Similar Papers

Using Machine Learning in Electrical Tomography for Building Energy Efficiency through Moisture Detection
Grzegorz Kłosowski ... Konrad Niderla
Energies | VOL. 16
Grzegorz Kłosowski, et. al.Grzegorz Kłosowski ... Konrad Niderla
11 Feb 2023
Energies | VOL. 16

Multilabel Text Classification in News Articles Using Long-Term Memory with Word2Vec
Winda Kurnia Sari ... Dian Palupi Rini
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) | VOL. 4
Winda Kurnia Sari, et. al. Winda Kurnia Sari ... Dian Palupi Rini
19 Apr 2020
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) | VOL. 4

Bidirectional LSTM with attention mechanism and convolutional layer for text classification
Gang Liu ... Jiabao Guo
Neurocomputing | VOL. 337
Gang Liu, et. al.Gang Liu ... Jiabao Guo
01 Feb 2019
Neurocomputing | VOL. 337

Decoding of finger trajectory from ECoG using deep learning
Ziqian Xie ... Odelia Schwartz
Journal of Neural Engineering | VOL. 15
Ziqian Xie, et. al.Ziqian Xie ... Odelia Schwartz
28 Feb 2018
Journal of Neural Engineering | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Chinese Text Classification Model Based on Deep Learning

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Future Internet