Abstract

The rapid increase in data volume and features dimensionality have a negative influence on machine learning and many other fields, such as decreasing classification accuracy and increasing computational cost. Feature selection technique has a critical role as a preprocessing step in reducing these issues. It works by eliminating the features that may negatively influence the classifiers' performance, such as irrelevant, redundant and less informative features. This paper aims to introduce an improved Harris hawks optimization (IHHO) by utilizing elite opposite-based learning and proposing a new search mechanism. Harris hawks optimization (HHO) is a novel metaheuristic general-purpose algorithm recently introduced to solve continuous search problems. Compared to conventional HHO, the proposed IHHO can avoid trapping in local optima and has an enhanced search mechanism, relying on mutation, mutation neighborhood search, and rollback strategies to raise the search capabilities. Moreover, it improves population diversity, computational accuracy, and accelerates convergence rate. To evaluate the performance of IHHO, we conducted a series of experiments on twenty benchmark datasets collected from the UCI repository and the scikit-feature project. The datasets represent different levels of feature dimensionality, such as low, moderate, and high. Further, four criteria were adopted to determine the superiority of IHHO: classification accuracy, fitness value, number of selected features, and statistical tests. Furthermore, a comparison between IHHO and other well-known algorithms such as Generic algorithm (GA), Grasshopper Optimization Algorithm (GOA), Particle Swarm Optimization (PSO), Ant Lion Optimizer (ALO), Whale Optimization Algorithm (WOA), Butterfly Optimization Algorithm (BOA) and Slime Mould Algorithm (SMA) was performed. The experimental results have confirmed the dominance of IHHO over the other optimization algorithms in different aspects, such as accuracy, fitness value, and feature selection.

Highlights

  • The growth in data volume and features dimensionality in the last few years has caused severe difficulties to researchers in many fields such as big data, data mining, data science and other fields

  • Each algorithm is applied on all the datasets to determine the stability of the algorithm over various feature dimensionality

  • Harris hawks optimization (HHO) has two phases devoted to the exploration and four phases for exploitation

Read more

Summary

Introduction

The growth in data volume and features dimensionality in the last few years has caused severe difficulties to researchers in many fields such as big data, data mining, data science and other fields. It is well-known that the analysis of high dimensional data suffers from problems of dimensionality, sparsity, and complexity [1]. Performing feature selection technique is mandatory to reduce the number of features. The feature selection technique works by removing noisy and irrelative features from the dataset. To deal with high dimensional features in machine learning, it is common to apply feature selection to select the most informative subset

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.