Abstract

In this paper, we focus on feature coverage policies used for feature selection in the text classification domain. Two alternative policies are discussed and compared: corpus-based and class-based selection of features. We make a detailed analysis of pruning and keyword selection by varying the parameters of the policies and obtain the optimal usage patterns. In addition, by combining the optimal forms of these methods, we propose a novel two-stage feature selection approach. The experiments on three independent datasets showed that the proposed method results in a statistically significant increase over the traditional methods in the success rates of the classifier.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call