Six-Granularity Based Chinese Short Text Classification

Xinjie Sun,Zhifang Liu,Xingying Huo

doi:10.1109/access.2023.3265712

Xinjie Sun, Zhifang Liu + Show 1 more

Open Access

https://doi.org/10.1109/access.2023.3265712

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2023
Citations: 5	License type: CC BY-NC-ND 4.0

Affiliation: Liupanshui Normal University

Abstract

Short text classification is an important task in Natural Language Processing (NLP). The classification result for Chinese short text is always not ideal due to the sparsity problem of them. Most of the previous classification models for Chinese short text are based on word or character, considering that Chinese radical can also represent the meaning individually, so word, character and radical are all used to build a Chinese short text classification model in this paper, which solves the data sparsity problem of short text. In addition, in the process of segmenting sentences into words, considering that jieba will cause the loss of key information and ngram will generate noise words, both jieba and ngram are used to construct a six-granularity (i.e. word-jieba, word-jieba-radical, word-ngram, word-ngram-radical, character and character-radical) based Chinese short text classification (SGCSTC) model. Additionally, different weights are assigned to the six granularities and are automatically updated in the process of back-propagation using cross-entropy loss due to the different influence of them on the classification results. The classification Accuracy, Precision, Recall and F1 of SGCSTC in THUCNews-S dataset are 93.36%, 94.47%, 94.15% and 94.31% respectively, and that in CNT dataset are 92.67%, 92.38%, 93.15% and 92.76% respectively, and multiple comparative experiment results on THUCNews-S and CNT datasets show that SGCSTC outperforms the state-of-the-art text classification models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Six-Granularity Based Chinese Short Text Classification

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Chinese short news text classification based on BERT and sparse autoencoder
Jiuzhou Lin
-
Jiuzhou LinJiuzhou Lin
30 Nov 2022
30 Nov 2022

A Fine-grained Chinese Short Text Classification Method Based on Capsule Networks
Yangshuyi Xu ... Lin Zhang
Journal of Physics: Conference Series | VOL. 2555
Yangshuyi Xu, et. al.Yangshuyi Xu ... Lin Zhang
01 Jul 2023
Journal of Physics: Conference Series | VOL. 2555

Word-Level and Pinyin-Level Based Chinese Short Text Classification
Xinjie Sun ... Xingying Huo
IEEE Access | VOL. 10
Xinjie Sun, et. al.Xinjie Sun ... Xingying Huo
01 Jan 2021
IEEE Access | VOL. 10

Feature-enhanced text-inception model for Chinese long text classification
Guo Yang ... Xu Dongdong
Scientific Reports | VOL. 13
Guo Yang, et. al.Guo Yang ... Xu Dongdong
06 Feb 2023
Scientific Reports | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Six-Granularity Based Chinese Short Text Classification

Abstract

Talk to us

Similar Papers

More From: IEEE Access