Kernel PCA and SVM-RFE based feature selection for classification of dengue microarray dataset

Elke Annisa Octaria,Alhadi Bustamam,Titin Siswantining,Devvi Sarwinda

doi:10.1063/5.0023930

Abstract

The Classification of microarray data are a challenging task because it has many features (genes) and a few samples in gene expression data. Feature selection on microarray data is very important processing in the classification method. Feature selection can produce fewer features to improve classification accuracy in high dimensional data. In this research, we compare two methods, namely kernel principal component analysis (Kernel PCA) and support vector machine - recursive feature elimination (SVM-RFE). Both are suitable methods for the selection of features. Kernel PCA is an extension of PCA using techniques of kernel methods, which works better on complicated spatial structures of high dimensional features. While SVM-RFE is an algorithm to select genes according to their weights. In this paper, the data taken from the National Center Biotechnology Information (NCBI) for Dengue fever microarray dataset. We choose the Support Vector Machine Classifier to classify our binary classes (dengue or health). From the experimental results, Kernel PCA and SVM-RFE have similarity accuracy (ACC), area under the curve (AUC), precision and recall but for running time, Kernel PCA only requires a short computational time than SVM-RFE for classification dengue and healthy patients on a dataset. So, Kernel PCA method as a feature selection is very helpful in improving the accuracy of classification performance and reducing the time consumption for the dengue microarray dataset.

Full Text