Abstract
Blood cancer has been a growing concern during the last decade and requires early diagnosis to start proper treatment. The diagnosis process is costly and time-consuming involving medical experts and several tests. Thus, an automatic diagnosis system for its accurate prediction is of significant importance. Diagnosis of blood cancer using leukemia microarray gene data and machine learning approach has become an important medical research today. Despite research efforts, desired accuracy and efficiency necessitate further enhancements. This study proposes an approach for blood cancer disease prediction using the supervised machine learning approach. For the current study, the leukemia microarray gene dataset containing 22,283 genes, is used. ADASYN resampling and Chi-squared (Chi2) features selection techniques are used to resolve imbalanced and high-dimensional dataset problems. ADASYN generates artificial data to make the dataset balanced for each target class, and Chi2 selects the best features out of 22,283 to train learning models. For classification, a hybrid logistics vector trees classifier (LVTrees) is proposed which utilizes logistic regression, support vector classifier, and extra tree classifier. Besides extensive experiments on the datasets, performance comparison with the state-of-the-art methods has been made for determining the significance of the proposed approach. LVTrees outperform all other models with ADASYN and Chi2 techniques with a significant 100% accuracy. Further, a statistical significance T-test is also performed to show the efficacy of the proposed approach. Results using k-fold cross-validation prove the supremacy of the proposed model.
Highlights
Blood cancer has been a growing concern during the last decade and requires early diagnosis to start proper treatment
logistic regression (LR), k-nearest neighbor (KNN), and logistics vector trees classifier (LVTrees) models outperform all other models in terms of accuracy score
Adaboost classifier (ADA) performs poorly because of the small size of the dataset because it requires a large number of records to boost its accuracy due to its boosting approach
Summary
Blood cancer has been a growing concern during the last decade and requires early diagnosis to start proper treatment. Diagnosis of blood cancer using leukemia microarray gene data and machine learning approach has become an important medical research today. This study proposes an approach for blood cancer disease prediction using the supervised machine learning approach. Diagnosis and prediction have been considered prudent ways to reduce cancer deaths worldwide In this regard, this study focuses on the prediction of blood cancer. Gal et al.[9] used KNN, SVM, and RF classifiers for achieving accuracy scores of 84%, 74%, and 81%, respectively Despite such efforts to elevate the performance of the machine and deep learning classifiers, the desired accuracy is not met for blood cancer prediction. The chief objective of the current study is to propose an approach that can perform blood cancer prediction with high accuracy using microarray gene data.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have