Abstract

Cancer can be considered as one of the leading causes of death widely. One of the most effective tools to be able to handle cancer diagnosis, prognosis, and treatment is by using expression profiling technique which is based on microarray gene. For each data point (sample), gene data expression usually receives tens of thousands of genes. As a result, this data is large-scale, high-dimensional, and highly redundant. The classification of gene expression profiles is considered to be a (NP)-Hard problem. Feature (gene) selection is one of the most effective methods to handle this problem. A hybrid cancer classification approach is presented in this paper, and several machine learning techniques were used in the hybrid model: Pearson's correlation coefficient as a correlation-based feature selector and reducer, a Decision Tree classifier that is easy to interpret and does not require a parameter, and Grid Search CV (cross-validation) to optimize the maximum depth hyperparameter. Seven standard microarray cancer datasets are used to evaluate our model. To identify which features are the most informative and relative using the proposed model, various performance measurements are employed, including classification accuracy, specificity, sensitivity, F1-score, and AUC. The suggested strategy greatly decreases the number of genes required for classification, selects the most informative features, and increases classification accuracy, according to the results.

Highlights

  • Cancer can be considered as one of the leading death causes [1], and gene expression profiles derived from microarray data have been identified as promising cancer diagnostic indices [2].Microarrays are used to measure thousands of genes interactions at the same time and create a cellular function global picture [3, 4].e classification of microarray data is one of the most common and important applications of functional genomics microarray data which means classifying patients samples to many classes based on their gene expression profiles [5, 48]

  • Gene expression data is very challenging and complex; genes are Computational Intelligence and Neuroscience correlated with one another directly or indirectly which make classification process a very hard and difficult mission which generally requires using an accurate and powerful feature selection technique

  • In order to be able to select an informative genes subset while eliminating/declining redundant or irrelevant genes and to be able to improve the performance of microarray highdimension data classification, this research study introduces a hybrid feature selection approach, called PCC-DTCV, which combines different methods, Pearson’s correlation coefficients (PCC) and Decision Tree (DT) [11] as classification approach and feature selection [12,13,14,15,16,17] and Grid Search CV which can be employed as an optimization technique [13, 14, 18,19,20, 42,43,44,45,46,47], to optimize the tuning parameters of DT to be able to get the optimal feature subset

Read more

Summary

Introduction

Cancer can be considered as one of the leading death causes [1], and gene expression profiles derived from microarray data have been identified as promising cancer diagnostic indices [2]. In order to be able to select an informative genes subset while eliminating/declining redundant or irrelevant genes and to be able to improve the performance of microarray highdimension data classification, this research study introduces a hybrid feature selection approach, called PCC-DTCV, which combines different methods, Pearson’s correlation coefficients (PCC) and Decision Tree (DT) [11] as classification approach and feature selection [12,13,14,15,16,17] and Grid Search CV which can be employed as an optimization technique [13, 14, 18,19,20, 42,43,44,45,46,47], to optimize the tuning parameters of DT (max-depth) to be able to get the optimal feature subset. E proposed method, according to the experimental methods, reduces dimensionality and selects the most important and informative features (genes) and improves the identification of cancer tissues from benign tissues.

Background and Related Work
Preliminaries
Method
Results and Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call