Abstract

Feature selection is a technique commonly used in Data Mining and Machine Learning. Traditional feature selection methods, when applied to large datasets, generate a large number of feature subsets. Selecting optimal features within this high dimensional data space is time-consuming and negatively affects the system's performance. This paper proposes a new binary Salp Swarm Algorithm (bSSA) for selecting the best feature set from transformed datasets. The proposed feature selection method first transforms the original data-set using Principal Component Analysis (PCA) and fast Independent Component Analysis (fastICA) based hybrid data transformation methods; next, a binary Salp Swarm optimizer is used for finding the best features. The proposed feature selection approach improves accuracy and eliminates the selection of irrelevant features. We validate our technique on fifteen different benchmark data sets. We conduct an extensive study to measure the performance and feature selection accuracy of the proposed technique. The proposed bSSA is compared to Binary Genetic Algorithm (bGA), Binary Binomial Cuckoo Search (bBCS), Binary Grey Wolf Optimizer (bGWO), Binary Competitive Swarm Optimizer (bCSO), and Binary Crow Search Algorithm (bCSA). The proposed method attains a mean accuracy of 95.26% with 7.78% features on PCA-fastICA transformed datasets. The results show that bSSA outperforms the existing methods for the majority of the performance measures.

Highlights

  • In the age of Big Data, the Internet of Things, and pervasive communication, massive amounts of unstructured and high-dimensional data are generated using millions of ubiquitous devices every day

  • To select the optimal feature set, the proposed binary Salp Swarm Algorithm (bSSA) first transforms the original data using Principal Component Analysis (PCA)-fast Independent Component Analysis (fastICA) based transformed datasets followed by feature selection steps

  • The proposed bSSA attains the mean accuracy of 93.97%, 94.56%, and 94.73% over original, PCA, ICA-based transformed datasets respectively

Read more

Summary

INTRODUCTION

In the age of Big Data, the Internet of Things, and pervasive communication, massive amounts of unstructured and high-dimensional data are generated using millions of ubiquitous devices every day. Shekhawat et al.: bSSA With Hybrid Data Transformation for Feature Selection using recursive or backward feature elimination or forwardfeature-selection techniques These methods are computationally expensive compared to the filter methods and their complexity increases unexpectedly for datasets comprising of huge set of features. Uğuz [13] presented a Genetic Algorithm (GA) based hybrid approach and information gain for selecting an optimal feature set from a dataset transformed by PCA. The anticipated bSSA first converts the original dataset using PCA-fastICA based data transformation methods into a new set of features followed by feature selection using bSSA. The proposed bSSA first converts the original dataset using PCA-fastICA based data transformation methods into a new set of features followed by feature selection using bSSA.

FAST INDEPENDENT COMPONENT ANALYSIS
PROPOSED BINARY SALP SWARM FEATURE SELECTION METHOD
RESULTS AND DISCUSSION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.