One Class Genetic-Based Feature Selection for Classification in Large Datasets

Murad Alkubabji,Mohammed Aldasht,Safa Adi

doi:10.1007/978-3-319-96292-4_24

Abstract

Feature selection is a key success factor for classification problems with high dimensional and large datasets. In this paper, we introduce an approach for enhancing classification performance of high dimensional datasets using a combination of genetic algorithms for feature selection and One-class SVM for classification. The proposed approach is suitable for high dimensional and large datasets. It can be used when we have only one class observations and when high classification accuracy is required. Two benchmark datasets were taken from the NIPS 2003 variable selection competition and the UCI Machine Learning Repository to span a variety of domains and difficulties. Results show that applying feature selection prior to classification gives a higher prediction accuracy than using classification without any feature selection. It can also outperform classifier like random forest especially when we have datasets with a very large number of instances and a small number of observations like the ARCENE dataset.

Full Text