Abstract

As the number of electronic documents continues to increase, automatic text categorization has become a research hot spot in data mining. Feature selection is a crucial part of automatic text categorization. The selection of feature subset has an essential influence on the classification result. However, the traditional single feature selection algorithm often has different feature subsets due to different calculation principles. The calculation of feature values is also one-sided, which leads to the reduction of classification accuracy. Therefore, a mixed algorithm CHIECE is proposed in this paper, combining advantages of the two algorithms to select a more representative feature subset. The performance of the mixed text feature selection algorithm is evaluated in three widely used classifiers and compared with the traditional feature selection algorithms IG, CHI and ECE. From the results, we can see that the accuracy of mixed algorithm CHIECE is higher than that of traditional algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call