Abstract

In practice, classification problems have appeared in many scientific fields, including finance, medicine and industry. It is critically important to develop an effective and accurate classification model. Although numerous useful classifiers have been proposed, they are unstable, sensitive to noise and slow in computation. To overcome these drawbacks, the combination of feature selection techniques with traditional machine learning models is of great help. In this paper, a novel feature selection method called the opposition-based seagull optimization algorithm (OSOA) is proposed and studied. The OSOA is constructed based on an SOA whose population is determined by the opposition-based learning (OBL) algorithm. To evaluate its overall classification performance, some measures, including classification accuracy, number of selected features, receiver operating characteristic curve (ROC), and computation time, are adopted. The empirical results indicate that the suggested method exhibits higher or similar accuracy and computational efficiency in comparison with genetic algorithm (GA)-, simulated annealing (SA)-, and Fisher score (FS)-based classification models. The experimental results show that the OSOA is a computationally efficient feature selection technique that has the ability to select relevant variables. Furthermore, it performs well with high-dimensional data whose number of variables exceeds the number of samples. Thus, the OSOA is an effective approach for the enhancement of classification performance.

Highlights

  • With the development of computer and information techniques, large amounts of data are being generated from numerous sources, including economic activities, public administration and other scientific research fields [1]

  • We have shown the advantages of the proposed method in different datasets, including high-dimensional datasets, via comparison with some state-of-the-art feature selection methods such as Fisher score, simulated annealing and genetic algorithm

  • In this study, a novel hybrid classification approach is suggested by combining feature selection and machine learning methods

Read more

Summary

INTRODUCTION

With the development of computer and information techniques, large amounts of data are being generated from numerous sources, including economic activities, public administration and other scientific research fields [1]. Many intelligent optimization algorithms have been adopted to build wrapper feature selection methods. These approaches are infeasible for a large number of features To overcome this shortcoming, some regularization methods have been applied as feasible approaches to high-dimensional problems. We have shown the advantages of the proposed method in different datasets, including high-dimensional datasets, via comparison with some state-of-the-art feature selection methods such as Fisher score, simulated annealing and genetic algorithm. Fisher score (FS) is a type of filter method, based on the Fisher criterion, which has the ability to select the most relevant features. LASSO can be seen as a continuous and stable feature selection method It produces a sparse solution and makes the model easier to interpret by adjusting the parameter λ. GA has the ability to process large search spaces [45], [46]

CLASSIFICATION METHODS
FEATURE SELECTION STAGE
1: Input the training dataset and initialize the parameters of the SOA
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call