Abstract

As two major parts for tackling high-dimensional cancer microarray gene data sets, feature selection and classification have attracted an increasing interest in academia and medical community. Since cancer gene expression data sets have small samples, high dimensionality, and class imbalance problems, extracting useful gene information and effective classification becomes more challenging. In this paper, we propose a novel feature selection algorithm called ISVM-RFE(FPD) for classification, which fully utilizes classification performance of each feature subset. Compared to the existing algorithms, ISVM-RFE(FPD) takes into account not only the intrinsic characteristic of the data, but also both linear and nonlinear correlation among features. The experimental results demonstrate that ISVM-RFE(FPD) outperforms the existing SVM-based feature selection algorithms in terms of recall rate of positive samples (rr p ) and G-mean (G).

Highlights

  • At present, cancer is the second leading cause of death

  • The experimental results show that ISVM-RFE(FPD) outperforms the existing SVM-based feature selection algorithms with respect to recall rate of positive samples and G-mean (G)

  • The training data set in training process (80% of the original data set) is obtained by stratified random sampling and 3-fold cross-validation is performed on C_list

Read more

Summary

Introduction

Cancer is the second leading cause of death. About 1 in 6 deaths is due to cancer [1]. Most cancers in their advanced stages are usually almost impossible to be treated, most patients still recover if a diagnosis would have been made in an early stage. For improving survival and cure rates, we need to analyze the corresponding data sets from the early diagnosis. Due to high cost of obtaining samples, the number of samples (usually only tens to hundreds) in most gene expression data sets for cancer classification is very small compared to the number of genes (usually thousands). Selecting a small number of genes containing as much information as possible from a large number of cancer microarray gene data is a crucial and challenging problem.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call