A Hybrid Feature Selection Optimization Model for High Dimension Data Classification

Mohammed Qaraad,Ibrahim I M Manhrawy,Hanaa Fathi,Souad Amjad,Passent El Kafrawy,Bayoumi Ali Hassan

doi:10.1109/access.2021.3065341

Abstract

Feature selection is an NP-hard combinatorial problem, in which the number of possible feature subsets increases exponentially with the number of features. In the case of large dimensionality, the goal of feature selection is to determine the smallest possible features considering the most informative subset. In this paper, we proposed a hybrid feature selection optimization model for Cancer Classification called, ENSVM. Our model is based on using the Elastic Net (EN) method that regulates and selects variables for gene selection of genomic microarray data. We applied three different optimization techniques namely Social Ski-Driver (SSD), Randomized SearchCV (RS) and Elastic NetCV (ENCV) for determining Elastic Net with traditional Support Vector Machines for classification. To evaluate the model, we compared the results of applying ENSVM to seven genomic microarray data with the SSD-SVM model and SVM with (RBF) kernel without any feature selection method. The results of the comparison revealed the effect of ENSVM in selecting the optimal feature subset that maximized the classification performance. Accordingly, minimizing the number of features is significant when analyzing high dimensional data for performance nevertheless accuracy. Moreover, the ENSVM model is superior compared with the SSD-SVM model.

Highlights

Feature selection is called the NP-hard combinatorial problem [1], where subsets of potential features increase exponentially with the number of features [2]
AND DISCUSSION we present the results of the various experiments conducted to evaluate the performance of the proposed ENSVM model
The platform used to test the ENSVM model is a PC with the detailed settings as shown in Table 2.: Table 4 uses seven benchmark sample microarray reference datasets and binary classes

Summary

Introduction

Feature selection is called the NP-hard combinatorial problem [1], where subsets of potential features increase exponentially with the number of features [2]. To increase the classification performance, it was necessary to rely on selecting a significate set of features, as it is a major step in most classification methods. Elastic Net (EN) is a regularization method, with which high-dimensional data sets can be modelled in addition to variable selection. It is said that EN modelling can use high dimensional data sets for problems with small sample sizes [3]. This is not entirely true, as EN modelling is based on the best estimate of model parameters. If the number of variables (or features) p is much higher than the number of observations (or samples)

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 49	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Hybrid Feature Selection Optimization Model for High Dimension Data Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

On the Stability of Feature Selection Methods in Software Quality Prediction: An Empirical Investigation
Huanjing Wang ... Taghi M Khoshgoftaar
International Journal of Software Engineering and Knowledge Engineering | VOL. 25
Huanjing Wang, et. al.Huanjing Wang ... Taghi M Khoshgoftaar
01 Nov 2015
International Journal of Software Engineering and Knowledge Engineering | VOL. 25

A comprehensive investigation of the impact of feature selection techniques on crashing fault residence prediction models
Kunsong Zhao ... Tao Zhang
Information and Software Technology | VOL. 139
Kunsong Zhao, et. al.Kunsong Zhao ... Tao Zhang
01 Nov 2021
Information and Software Technology | VOL. 139

Stability of Three Forms of Feature Selection Methods on Software Engineering Data
Huanjing Wang ... Amri Napolitano
-
Huanjing Wang, et. al.Huanjing Wang ... Amri Napolitano
01 Jul 2015
01 Jul 2015

A new feature selection method to improve the document clustering using particle swarm optimization algorithm
Laith Mohammad Abualigah ... Essam Said Hanandeh
Journal of Computational Science | VOL. 25
Laith Mohammad Abualigah, et. al.Laith Mohammad Abualigah ... Essam Said Hanandeh
06 Sep 2017
Journal of Computational Science | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid Feature Selection Optimization Model for High Dimension Data Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access