Abstract

Feature selection aims at filtering out some unrepresentative features from a given dataset in order to construct more effective learning models. Furthermore, ensemble feature selection by combining multiple feature selection methods has shown its outperformance over single feature selection. However, the performances of different (ensemble) feature selection methods have not been fully examined over multi-class imbalanced datasets. On the other hand, for class imbalanced datasets, one widely considered solution is to re-balance the datasets by data over-sampling, which generates some synthetic examples for the minority classes. However, the effect of performing (ensemble) feature selection on over-sampling multi-class imbalanced datasets has not been investigated. Therefore, the first research objective is to examine the performances of single and ensemble feature selection methods by fifteen well-known filter, wrapper, and embedded algorithms in terms of classification accuracy. For the second research objective, two orders of combining the feature selection and over-sampling steps are compared in order to find out the best combination procedure as well as the best combined algorithms. The experimental results based on ten different domain datasets containing low to very high feature dimensions show that ensemble feature selection methods slightly perform better than single ones. However, their performance differences are not big. To combine with the Synthetic Minority Oversampling Technique (SMOTE) over-sampling algorithm, performing feature selection first and over-sampling second outperforms the other procedure. Although the best combined algorithms are based on ensemble feature selection, eXtreme Gradient Boosting (XGBoost), as the single best feature selection algorithm, combined with SMOTE provides very similar classification performance to the best combined algorithms. To consider the issues of classification performance and compactional cost, the optimal solution is based on the combined XGBoost and SMOTE.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call