Abstract

Feature selection is an NP-hard combinatorial problem, in which the number of possible feature subsets increases exponentially with the number of features. In the case of large dimensionality, the goal of feature selection is to determine the smallest possible features considering the most informative subset. In this paper, we proposed a hybrid feature selection optimization model for Cancer Classification called, ENSVM. Our model is based on using the Elastic Net (EN) method that regulates and selects variables for gene selection of genomic microarray data. We applied three different optimization techniques namely Social Ski-Driver (SSD), Randomized SearchCV (RS) and Elastic NetCV (ENCV) for determining Elastic Net with traditional Support Vector Machines for classification. To evaluate the model, we compared the results of applying ENSVM to seven genomic microarray data with the SSD-SVM model and SVM with (RBF) kernel without any feature selection method. The results of the comparison revealed the effect of ENSVM in selecting the optimal feature subset that maximized the classification performance. Accordingly, minimizing the number of features is significant when analyzing high dimensional data for performance nevertheless accuracy. Moreover, the ENSVM model is superior compared with the SSD-SVM model.

Highlights

  • Feature selection is called the NP-hard combinatorial problem [1], where subsets of potential features increase exponentially with the number of features [2]

  • AND DISCUSSION we present the results of the various experiments conducted to evaluate the performance of the proposed ENSVM model

  • The platform used to test the ENSVM model is a PC with the detailed settings as shown in Table 2.: Table 4 uses seven benchmark sample microarray reference datasets and binary classes

Read more

Summary

Introduction

Feature selection is called the NP-hard combinatorial problem [1], where subsets of potential features increase exponentially with the number of features [2]. To increase the classification performance, it was necessary to rely on selecting a significate set of features, as it is a major step in most classification methods. Elastic Net (EN) is a regularization method, with which high-dimensional data sets can be modelled in addition to variable selection. It is said that EN modelling can use high dimensional data sets for problems with small sample sizes [3]. This is not entirely true, as EN modelling is based on the best estimate of model parameters. If the number of variables (or features) p is much higher than the number of observations (or samples)

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.