A Text Classification Algorithm Based on PCA

Jian-Lin Li

doi:10.12783/dtcse/cst2017/12555

Abstract

Study the related WEB text feature extraction algorithm, through the mutual information (MI), document frequency (DF), information gain (IG) andχ2 statistics (CHI) algorithm research, using of their respective advantage complementary, proposed a multiple combination feature extraction algorithm based on principal component analysis (PCA-MCFEA). First, by the orthogonal transformation of the PCA algorithm to faster dimensionality reduction of the text feature space; Then through the multiple combination feature extraction algorithm in the lower dimension of feature space fast extract more representative of the feature, filter out some representative weak feature items; Finally, using the SVM classifier to classify the text. The experimental results show that PCA-MCFEA algorithm can effectively improve text classification accuracy and running efficiency.

Full Text