N-grams based feature selection and text representation for Chinese Text Classification

Duoqian Miao,Jean-Hugues Chauchat,Rui Zhao,Wen Li,Zhihua Wei

doi:10.2991/ijcis.2009.2.4.5

Duoqian Miao, Jean-Hugues Chauchat + Show 3 more

Open Access

https://doi.org/10.2991/ijcis.2009.2.4.5

Copy DOI

Abstract

N-grams based feature selection and text representation for Chinese Text Classification

Highlights

With the rapidly increasing quantity of web sources and electronic texts in Chinese, much attention has been paid to the Chinese text classification (TC)
We discussed Chinese text classification based on n-grams by using different feature selection methods and different text representation weights
In the case of using less than 3000 features, the feature selection methods based on n-gram frequency always give better results than those based on text frequency

Summary

Introduction

With the rapidly increasing quantity of web sources and electronic texts in Chinese, much attention has been paid to the Chinese text classification (TC). In addition to some difficulties in text classification in English, Chinese TC exhibits the following difficulties: (1) there is no space between words in Chinese text. In a TC task, the term can be a word, a character or a n-gram. These features play the same role in Chinese TC. Unlike most of western languages, Chinese words do not have a remarkable boundary. This means that the word segmentation is necessary before any other preprocessing. Word sense disambiguation issue and unknown word recognition problem limit the precision of word segmentation

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Computational Intelligence Systems	Publication Date: Jan 1, 2009
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

N-grams based feature selection and text representation for Chinese Text Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Computational Intelligence Systems

Lead the way for us

Similar Papers

Feature-enhanced text-inception model for Chinese long text classification
Guo Yang ... Xu Dongdong
Scientific Reports | VOL. 13
Guo Yang, et. al.Guo Yang ... Xu Dongdong
06 Feb 2023
Scientific Reports | VOL. 13

Performance of using LDA for Chinese news text classification
Xiaojun Wu ... Nan Yu
-
Xiaojun Wu, et. al.Xiaojun Wu ... Nan Yu
01 May 2015
01 May 2015

Chinese Short Text Classification Based On Deep Learning
Xi He ... Tiankai Li
-
Xi He, et. al.Xi He ... Tiankai Li
17 Dec 2021
17 Dec 2021

Research on Feature Selection and kNN Classification Method in Chinese Text Classification
Chao Xiao ... Ping Wu
-
Chao Xiao, et. al.Chao Xiao ... Ping Wu
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

N-grams based feature selection and text representation for Chinese Text Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Computational Intelligence Systems