One of the most challenging and common problems in machine learning is the Feature Selection (FS) process, which reduces the dataset size by finding optimal subsets of features. The Horse Herd Optimization Algorithm (HOA) is a new metaheuristic algorithm created by modeling the herd behavior of horses and developed for large scale optimization problems. This paper proposes the binary version of the HOA as a wrapper FS method to solve the FS problem. The proposed algorithm is a binary chaotic horse herd optimization algorithm for feature selection (BCHOAFS). The proposed BCHOAFS is applied to select the optimal feature combination that maximizes classification accuracy while minimizing the number of selected features. Classifier algorithms from machine learning algorithms were used to test the accuracy of the reduced subsets. The proposed method was named binary horse herd optimization for feature selection (BHOAFS) before adding chaotic maps; the k-nearest neighbor (k-NN) and Support Vector Machine (SVM) were tested as separate classifiers. It has been seen that k-NN classification accuracy gives better results than SVM. The BHOAFS-kNN method using the k-NN classification was combined with five chaotic maps and named as BCHOAFS-Logistics, BCHOAFS-Piecewise, BCHOAFS-Singer, BCHOAFS-Sinusoidal, BCHOAFS-Tent. The BCHOAFS versions were run on datasets consisting of 18 different sizes and quality datasets (i.e., low scale, medium scale, and large scale) taken from the UCI repository and compared with state-of-the-art algorithms in previous studies. The results prove that the proposed version, especially with the BCHOAFS-Piecewise and the BCHOAFS-Singer chaotic map outperforms or competes with well-known methods. The proof of the proposed approach’s statistical significance has been validated using the Friedman Signed Rank test and post hoc Wilcoxon test. The novelty of BCHOAFS is that HOA, which is an optimization algorithm specially designed for large scale data, is the first binary chaotic-based algorithm developed for feature selection problems. It also proposes a new local search strategy called Similarity Measurement Function (SMF). As a result, versions of the proposed algorithm BCHOAFS can be used for the FS problem.
Read full abstract