Abstract

In this paper, we introduce an alternative framework for selecting a most relevant subset of the original set of features for the purpose of text categorization. Given a feature set and a local feature evaluation function (such as chi-square measure, mutual information etc.,) the proposed framework ranks the features in groups instead of ranking individual features. A group of features with rth rank is more powerful than the group of features with (r+1)th rank. Each group is made up of a subset of features which are supposed to be capable of discriminating every class from every other class. The added advantage of the proposed framework is that it automatically eliminates the redundant features while selecting features without requirement of study of features in combination. Further the proposed framework also helps in handling overlapping classes effectively through selection of low ranked yet powerful features. An extensive experimentation has been conducted on three benchmarking datasets using four different local feature evaluation functions with Support Vector Machine and Naïve Bayes classifiers to bring out the effectiveness of the proposed framework over the respective conventional counterparts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.