Abstract
In classification, regression, and other data mining applications, feature selection (FS) is an important pre-process step which helps avoid advert effect of noisy, misleading, and inconsistent features on the model performance. Formulating it into a global combinatorial optimization problem, researchers have employed metaheuristic algorithms for selecting the prominent features to simplify and enhance the quality of the high-dimensional datasets, in order to devise efficient knowledge extraction systems. However, when employed on datasets with extensively large feature-size, these methods often suffer from local optimality problem due to considerably large solution space. In this study, we propose a novel approach to dimensionality reduction by using Henry gas solubility optimization (HGSO) algorithm for selecting significant features, to enhance the classification accuracy. By employing several datasets with wide range of feature size, from small to massive, the proposed method is evaluated against well-known metaheuristic algorithms including grasshopper optimization algorithm (GOA), whale optimization algorithm (WOA), dragonfly algorithm (DA), grey wolf optimizer (GWO), salp swarm algorithm (SSA), and others from recent relevant literature. We used k-nearest neighbor (k-NN) and support vector machine (SVM) as expert systems to evaluate the selected feature-set. Wilcoxon’s ranksum non-parametric statistical test was carried out at 5% significance level to judge whether the results of the proposed algorithms differ from those of the other compared algorithms in a statistically significant way. Overall, the empirical analysis suggests that the proposed approach is significantly effective on low, as well as, considerably high dimensional datasets, by producing 100% accuracy on classification problems with more than 11,000 features.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.