A Spark-based Distributed Whale Optimization Algorithm for Feature Selection

Hongwei Chen,Lin Han,Jun Zeng,Zhou Hu,Jiansen Yuan,Zhiwei Ye,Qiao Hou

doi:10.1109/idaacs.2019.8924334

Abstract

With the rapid development of the Internet and big data technologies, high-dimensional data generated in various fields has increased dramatically. Feature selection is an effective way to solve data processing problems caused by high dimensionality and high computational complexity. The traditional feature selection method shows the problem of insufficient classification accuracy and low processing efficiency when dealing with high-dimensional and large-scale data. The traditional feature selection method shows low classification accuracy and low processing efficiency when dealing with high-dimensional and large-scale data. This paper proposed a feature selection method based on Whale Optimization Algorithm to learn mining feature selection rules, then improve the accuracy of feature selection. However, when the data size is very large, the efficiency of single node execution is low. Therefore, this paper combined the Whale Optimization Algorithm with the parallel computing model of the Spark platform, and proposed a feature selection method based on the Spark platform for distributed Whale Optimization Algorithm. The results showed that the excellent result search ability of the Whale Optimization Algorithm combined with the distributed and efficient calculation speed can realize the efficient solution of the feature selection optimization model.

Full Text