Abstract

High dimensionality of the feature space is one of the problems in the field of text classification. Identification of optimal subset of features can optimize text classification process in terms of processing time and performance. In this paper, we propose a novel Relevant-Based Feature Ranking (RBFR) algorithm which identifies and selects smaller subsets of more relevant features in the feature space. We compared the performance of the RBFR against other existing feature selection methods such as balanced accuracy measure, information gain, Gini index, and odds ratio on 3 datasets, namely, 20 newsgroup, Reuters, and WAP datasets. We have used 5 machine learning models (SVM, NB, kNN, RF, and LR) to test and evaluate the proposed feature selection method. We found that the performance of the proposed feature selection method is 25.4305% times more effective than the existing feature selection methods in terms of accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call