Abstract

Abstract This research proposes a new hybrid approach for feature selection and Support Vector Machine (SVM) model selection based on a new variation of Cohort Intelligence (CI) algorithm. Feature selection can improve the accuracy of classification algorithms and reduce their computation complexity by removing the irrelevant and redundant features. SVM is a classification algorithm that has been used in many areas, such as bioinformatics and pattern recognition. However, the classification accuracy of SVM depends mainly on tuning its hyperparameters (i.e., SVM model selection). This paper presents a framework that is comprised of the following two major components. First, Self-Adaptive Cohort Intelligence (SACI) algorithm is proposed, which is a new variation of the emerging metaheuristic algorithm, Cohort Intelligence (CI). Second, SACI is integrated with SVM resulting in a new hybrid approach referred to as SVM–SACI for simultaneous feature selection and SVM model selection. SACI differs from CI by employing tournament-based mutation and self-adaptive scheme for sampling interval and mutation rate. Furthermore, SACI is both real-coded and binary-coded, which makes it directly applicable to both binary and continuous domains. The performance of SACI for feature selection and SVM model selection was examined using ten benchmark datasets from the literature and compared with those of CI and five well-known metaheuristics, namely, Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Differential Evolution (DE) and Artificial Bee Colony (ABC). The comparative results demonstrate that SACI outperformed CI and comparable to or better than the other compared metaheuristics in terms of the SVM classification accuracy and dimensionality reduction. In addition, SACI requires less tuning efforts as the number of its control parameters is less than those of the other compared metaheuristics due to adopting the self-adaptive scheme in SACI. Finally, this research suggests employing more efficient methods for high-dimensional or large datasets due to the relatively high training time required by search strategies based on metaheuristics when applied to such datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call