Binary Social Mimic Optimization Algorithm With X-Shaped Transfer Function for Feature Selection

Kushal Kanti Ghosh,Junhee Hong,Pawan Kumar Singh,Zong Woo Geem,Ram Sarkar

doi:10.1109/access.2020.2996611

Kushal Kanti Ghosh, Junhee Hong + Show 3 more

Open Access

https://doi.org/10.1109/access.2020.2996611

Copy DOI

Abstract

Definitive optimization algorithms are not able to solve high dimensional optimization problems when the search space grows exponentially with the problem size, and an exhaustive search also becomes impractical. To encounter this problem, researchers use approximation algorithms. A category of approximation algorithms is meta-heuristic algorithms which have shown an acceptable degree of efficiency to solve this kind of problems. Social Mimic Optimization (SMO) algorithm is a recently proposed meta-heuristic algorithm which is used to optimize problems with continuous solution space. It is proposed by following the behavior of people in society. SMO can efficiently explore the solution space for obtaining optimal or near-optimal solution by minimizing a given fitness function. Feature selection is a binary optimization problem where the aim is to maximize the classification accuracy of a learning algorithm using minimum the number of features. To convert the continuous search space to a binary one, a proper transfer function is required. The effect a transfer function has on the binary variant of an optimization algorithm is very important since selecting a particular subset of features based on the solution values attained by the algorithm in continuous search space depends on the considered transfer function. To this end, we have proposed a new transfer function, namely X-shaped transfer function, to enhance the exploration and exploitation ability of binary SMO. The proposed X-shaped transfer function utilizes two components and crossover operation to obtain a new solution. Effect of the proposed X-shaped transfer function is compared with the effect of four S-shaped and four V-shaped transfer functions on SMO in terms of achieved classification accuracy, rate of convergence, and number of features selected over 18 standard UCI datasets. The proposed algorithm is also compared with state-of-the-art meta-heuristic feature selection (FS) algorithms. Experimental results confirm the efficiency of the proposed approach in improving the classification accuracy compared to other meta-heuristic algorithms, and the superiority of X-shaped transfer function over commonly used S-shaped and V-shaped transfer functions. The source code of the proposed method along with the datasets used can be found at https://github.com/Rangerix/SocialMimic.

Highlights

In this era of computer and technology, with every advancement in the field of image processing, pattern recognition, financial analysis, business management, medical studies [1]–[4] and others, we are bound to deal with hugeThe associate editor coordinating the review of this manuscript and approving it for publication was Larbi Boubchir .amount of data, whose dimensions are increasing everyday
We have used K-Nearest Neighbor (KNN) [71] classifier with Euclidean distance metric to measure classification accuracy of the optimal feature subset selected by Social Mimic Optimization (SMO) algorithm
In k-fold cross-validation, the dataset is divided into k equal partitions where k − 1 folds are utilized for training and the remaining fold is utilized for testing the classification model

Summary

INTRODUCTION

In this era of computer and technology, with every advancement in the field of image processing, pattern recognition, financial analysis, business management, medical studies [1]–[4] and others, we are bound to deal with huge. The reason we have chosen this optimization method is because SMO is simple to implement but can produce effective results It does not require any inherent parameter in contrary to other popular meta-heuristic algorithms, except only the population size and maximum number of iterations. Each follower (solution) is assessed by the proposed fitness function which relies on the performance of the K-Nearest Neighbor (KNN) classifier [71] in order to determine the classification error rate and on the number of features selected. The time complexity of the proposed method is O(maxIter × popSize × D × tfitness), where maxIter is the maximum number of iterations, popSize represents the number of followers (individuals), D represents the dimension of the problem in consideration, and tfitness denotes the time requirement for calculating the fitness value of a particular individual using a given classifier. It is to be noted that the usage of X-shaped transfer function instead of S-shaped or V-shaped transfer functions, does not alter the time complexity

RESULTS AND DISCUSSION

COMPARISON

CONCLUSION AND FUTURE DIRECTIONS