A Approach for Text Classification Feature Dimensionality Reduction and Rule Generation on Rough Set

Shiqun Yin,Yuhui Qiu,Zhixing Huang,Lu Chen

doi:10.1109/icicic.2008.7

Abstract

The high dimensional data are frequently met when we apply Web text classification. Mining in high dimensional data is extraordinarily difficult because of the curse of dimensionality. We must adopt feature dimensionality reduction to solve these problems. A attribute reduction algorithm based on rough set theory is given in this paper to reduce the text feature term and extract rule. First, the weight of feature term is made discrete. Then, the decision table is made with weight as the condition attributes and classes of texts as the decision attributes. Finally, the classification rules are extracted by attribute reduction. The method is simple and feasible. It is advantageous in improving the efficiency of the selected feature subset and suitable for high-volume text classification. The extracted rules are easy understand. The accuracy is higher and the speed of classification is faster than the classification based on vector space comparison. This paper describes the proposed technique and provides experimental results.

Full Text