Empirical study of feature selection methods over classification algorithms

Chithra Selvaraj,K.B Sundhara Kumar,N Bhalaji

doi:10.1504/ijista.2018.10012887

Abstract

Feature selection methods are deployed in machine-learning algorithms for reducing the redundancy in the dataset and to increase the clarity in the system models without loss of much information. The objective of this paper is to investigate the performance of feature selection methods when they are exposed to different datasets and different classification algorithms. In this paper, we have investigated standard parameters such as accuracy, precision and recall over two feature selection algorithms namely Chi-Square feature selection and Boruta feature selection algorithms. Observations of the experiments conducted using R studio resulted around 5–6% increased performance in above said parameters when they were exposed to Boruta feature selection algorithm. The experiment was done on two different datasets with different set of features and we have used the following five standard classification algorithms – Naive Bayes, decision tree, support vector machines (SVM), random forest and gradient boosting.

Full Text