Abstract

Feature selection (FS) is mainly used as a pre-processing tool to reduce dimensionality by eliminating irrelevant or redundant features to be used for a machine learning or data mining algorithm. In this paper, we have introduced binary variant of a recently proposed meta-heuristic algorithm called Social Ski Driver (SSD) optimization. To the best of our knowledge, SSD has not been used yet in the domain of FS. Two binary variants of SSD are proposed using S-shaped and V-shaped transfer functions. Besides, the exploitation ability of SSD is improved by using a local search method, called Late Acceptance Hill Climbing (LAHC). The hybrid meta-heuristic is then converted to binary version by using said transfer functions. The proposed methods are applied on 18 standard UCI datasets and compared with 15 state-of-the-art FS methods. Also to check the robustness of the proposed method, we have applied it to 3 high dimensional microarray datasets and compared with 6 state-of-the-art methods. Achieved results confirm the superiority of the proposed methods compared to other meta-heuristic wrapper based FS methods considered here. Source code of this work is available at https://github.com/consigliere19/SSD-LAHC.

Highlights

  • With the recent advances in technology, huge amount of data has become available in different domains of image processing, pattern recognition, and disease diagnosis system [1]

  • The analysis of our algorithm shows that its time complexity is O(iter ∗ psize ∗ λ2 ∗) where iter is the maximum number of iterations, psize is the population size, λ is the parameter of Late Acceptance Hill Climbing (LAHC), tfitness is the complexity of calculating the fitness of a particular agent using the classifier and dim is the dimension of the dataset

  • The feature selection (FS) problem is formulated as a multi-objective optimization task with a fitness function tending to achieve high classification accuracy with low number of selected features

Read more

Summary

Introduction

With the recent advances in technology, huge amount of data has become available in different domains of image processing, pattern recognition, and disease diagnosis system [1]. Data dimensionality creates a huge impact on the performance of the various machine learning and data mining tasks, both in terms of time and storage needs of the computing devices. In this context, it can noted that there may be some redundancy in the data itself. All the features developed by some means to represent a pattern or an image are not important for the classification or analysis of the same.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call