An Experimental Approach of Applying Boruta and Elastic Net for Variable Selection in Classifying Breast Cancer Datasets

Padmavathi M.S

doi:10.1504/ijkedm.2019.10026410

Abstract

Feature selection identifies the key aspects involved in predicting the outcome. In this study, we propose boruta and elastic net (Enet) feature selection for classifying breast cancer datasets. A comparative study of boruta, Enet along with genetic algorithms (GA) and consistency-based subset feature selections are done, where Enet selected best features for Wisconsin diagnostic breast cancer (WDBC) and breast cancer datasets. To prove the stability of Enet feature selection, variable importance of machine learning algorithms like naive Bayes (NB), multilayer perceptron (MLP) and random forest (RF) is evaluated and compared. It is proved that the features obtained by Enet contain all the common variables selected by tested machine learning algorithms. The proposed Enet feature selection along with MLP for classification yields a better receiver operating characteristic (ROC): 0.990, 0.687 and a reduced root mean squared error (RMSE): 0.159, 0.429 for WDBC and breast cancer datasets, when compared with naive Bayes and RF.

Full Text