Abstract

Leukemia cancer is one of the most leading detrimental cancer diseases in worldwide. A huge number of genes are responsible for cancer diseases. Therefore, it is necessary to identify the most informative genes of Leukemia cancer. The main objectives of this study are to: (i) identify the most informative genes using five feature selection techniques (FST) and (ii) adopt six classifiers to classify the cancer disease and compare them. Leukemia cancer data has been taken from Kent ridge biomedical data repository, USA. There are 7129 genes and 72 patients. Among them, 47 patients are cancer and 25 are control. We have used five FST as t-test; Wilcoxon sign rank sum (WCSRS) test, random forest (RF), Boruta and least absolute shrinkage and selection operator (LASSO). We have also used six classifiers as Adaboost (AB), classification and regression tree (CART), artificial neural network (ANN), random forest (RF), linear discriminant analysis (LDA) and naive Bayes (NB). The performances of these classifiers are evaluated by accuracy (ACC), sensitivity (SE), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), and F-measure (FM). We used simulated dataset to check the validity of proposed method. The results indicate that the combination of LASSO based FST and NB classifier gives the highest classification accuracy of 99.95%. On the basis of the results, we can conclude that the combination of LASSO based FST and NB classifier predicts the leukemia cancer more accurately compare to any other combination of FST and classifiers utilized in this study.

Highlights

  • In recent world cancer is a most important health burden

  • Our study showed that least absolute shrinkage and selection operator (LASSO) feature selection techniques (FST) with Naive Bayes (NB)-based classifier gives the best classification accuracy along with other higher statistical performance

  • This study showed a plenary evaluation of classification of leukemia cancer gene expression with the two major criteria

Read more

Summary

Introduction

In recent world cancer is a most important health burden. It is caused when the divisions of cells are uncontrolled [1]. Leukemia is one of the most leading detrimental cancer diseases which is a group of blood cancer It begins in bone marrow and spreading via blood cell [3]. A system named global gene expression was proposed to understand the problem of cancer classification [6,7,8]. Microarray technology has bottomed the simultaneous monitoring of genes and cancer classification. Earlier their obtained result was so far promising.

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call