Research on performance variations of classifiers with the influence of pre-processing methods for Chinese short text classification.

Dezheng Zhang,Jing Li,Aziguli Wulamu,Yonghong Xie

doi:10.1371/journal.pone.0292582

Abstract

Text pre-processing is an important component of a Chinese text classification. At present, however, most of the studies on this topic focus on exploring the influence of preprocessing methods on a few text classification algorithms using English text. In this paper we experimentally compared fifteen commonly used classifiers on two Chinese datasets using three widely used Chinese preprocessing methods that include word segmentation, Chinese specific stop word removal, and Chinese specific symbol removal. We then explored the influence of the preprocessing methods on the final classifications according to various conditions such as classification evaluation, combination style, and classifier selection. Finally, we conducted a battery of various additional experiments, and found that most of the classifiers improved in performance after proper preprocessing was applied. Our general conclusion is that the systematic use of preprocessing methods can have a positive impact on the classification of Chinese short text, using classification evaluation such as macro-F1, combination of preprocessing methods such as word segmentation, Chinese specific stop word and symbol removal, and classifier selection such as machine and deep learning models. We find that the best macro-f1s for categorizing text for the two datasets are 92.13% and 91.99%, which represent improvements of 0.3% and 2%, respectively over the compared baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Oct 12, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Research on performance variations of classifiers with the influence of pre-processing methods for Chinese short text classification.

Abstract

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Chinese short news text classification based on BERT and sparse autoencoder
Jiuzhou Lin
-
Jiuzhou LinJiuzhou Lin
30 Nov 2022
30 Nov 2022

Six-Granularity Based Chinese Short Text Classification
Xinjie Sun ... Xingying Huo
IEEE Access | VOL. 11
Xinjie Sun, et. al.Xinjie Sun ... Xingying Huo
01 Jan 2023
IEEE Access | VOL. 11

Chinese Short Text Classification Based On Deep Learning
Xi He ... Jianping Li
-
Xi He, et. al.Xi He ... Jianping Li
17 Dec 2021
17 Dec 2021

Review of Chinese Short Text Classification
Fen Lin Wu ... Cheng Wang
Applied Mechanics and Materials | VOL. 336-338
Fen Lin Wu, et. al.Fen Lin Wu ... Cheng Wang
01 Jul 2013
Applied Mechanics and Materials | VOL. 336-338

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Research on performance variations of classifiers with the influence of pre-processing methods for Chinese short text classification.

Abstract

Talk to us

Similar Papers

More From: PloS one