Comparative Study on Feature Selection in Uighur Text Categorization

Yang Yong ,Xue Hua Jian ,Dong Xin Hua ,Li Xiao

doi:10.4156/aiss.vol4.issue3.3

Abstract

In this paper, the methods of classifying the Uighur language feature extraction have been studied. According to the feature that the Uighur language belongs to adhesion, three experiments were designed to inspect the influence on the accuracy of text classification by using different methods of feature extraction, the first experiment was designed to inspect the accuracy of text classification in case of stem segmentation by using the traditional methods of feature extraction, such as DF,IG,MI,CHI. The results show that the best classification accuracy rate is 91.34% by the method of DF feature extraction, while the best accuracy rate is 88.03% in the second experiment by the method of CHI feature extraction in the case of stem that are not segmented. The third experiment uses combination of feature selection methods, such as DF+IG,DF+MI,DF+CHI, and the result show that the accuracy rate of classification is 93.57% by the method of DF+CHI feature selection, which shows that it is the best method in all experiments.

Full Text