Abstract
Feature selection plays a vital role in boosting the performance of a classifier. The aim of feature selection is to remove irrelevant features and choose only highly invidious ones thus improving the performance of classification. This paper compares the performance of nine popular feature ranking (FR) metrics on six benchmark datasets using Naive Bayes (NB) and Support Vector Machine (SVM) classifiers. Results of comparisons are shown both in tabular and graphical forms. The nine FR metrics include: Chi squared (CHI), Odds Ratio (OR), Information Gain (IG), Gini Index (GINI), Poisson Ratio (POIS), Normalized Difference Measure (NDM), Balanced Accuracy Measure (ACC2), Distinguishing Feature Selector (DFS), and Binormal Separation (BNS). We used six datasets for empirical evaluation: WAP, K1a, RE0, RE1, Spam and Enron where two datasets (RE0, K1a) are highly skewed. The performance of classifiers is evaluated based on popular micro and macro-averaged F1 measures.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.