Enhanced Feature Subset Selection Using Niche Based Bat Algorithm

Noman Saleem,Alizaa Sabzwari,Kashif Zafar

doi:10.3390/computation7030049

Abstract

Redundant and irrelevant features disturb the accuracy of the classifier. In order to avoid redundancy and irrelevancy problems, feature selection techniques are used. Finding the most relevant feature subset that can enhance the accuracy rate of the classifier is one of the most challenging parts. This paper presents a new solution to finding relevant feature subsets by the niche based bat algorithm (NBBA). It is compared with existing state of the art approaches, including evolutionary based approaches. The multi-objective bat algorithm (MOBA) selected 8, 16, and 248 features with 93.33%, 93.54%, and 78.33% accuracy on ionosphere, sonar, and Madelon datasets, respectively. The multi-objective genetic algorithm (MOGA) selected 10, 17, and 256 features with 91.28%, 88.70%, and 75.16% accuracy on same datasets, respectively. Finally, the multi-objective particle swarm optimization (MOPSO) selected 9, 21, and 312 with 89.52%, 91.93%, and 76% accuracy on the above datasets, respectively. In comparison, NBBA selected 6, 19, and 178 features with 93.33%, 95.16%, and 80.16% accuracy on the above datasets, respectively. The niche multi-objective genetic algorithm selected 8, 15, and 196 features with 93.33%, 91.93%, and 79.16 % accuracy on the above datasets, respectively. Finally, the niche multi-objective particle swarm optimization selected 9, 19, and 213 features with 91.42%, 91.93%, and 76.5% accuracy on the above datasets, respectively. Hence, results show that MOBA outperformed MOGA and MOPSO, and NBBA outperformed the niche multi-objective genetic algorithm and the niche multi-objective particle swarm optimization.

Highlights

Feature selection is a method that selects features or a subset of relevant features of a dataset to achieve the best accuracy through a certain classifier
There are several feature selection techniques that are widely used, such as the two-step protocol that uses F-score for ranking features, as F-score reveals the discriminative power of each feature independently
Algorithm) [1], a name taking into account two meta-heuristic methods, simulated annealing (SA) and genetic algorithm (GA), that combines various qualities of existing algorithms to select the ideal subsets from an extensive component space

Summary

Introduction

Feature selection is a method that selects features or a subset of relevant features of a dataset to achieve the best accuracy through a certain classifier. There are two main purposes of feature selection It makes training easier, as applying a classifier is more efficient because it decreases the size of the dataset by reducing features. Most of the time, feature selection eliminates noise in the dataset, increasing the classification accuracy. Feature subset selection serves as a pre-processing step for the classification process. It selects the most relevant features that result in an increased classification accuracy and better time efficiency due to the decreased number of features. Different Approaches for Feature Subset Selection Filter Approach This feature subset selection method is independent of the used classifier. These empirical distribution methods pick strongly related features while filtering out weakly related features at the evaluation process of overall feature selection process

Methods

Results

Conclusion