Abstract

Microarray technology enables biologists to simultaneously monitor the activities of genome-wide features. This method generates gene expression data that can be used to classify cancers. Currently, disease diagnosis is based on gene expression data. However, gene expression data are regarded as high-dimensional data because they contain noisy, irrelevant, and redundant genes that are not required for categorization. Though, the curses of dimensionality and sparsity complicate the challenge of categorizing gene expression profiles. The curse of dimensionality is a significant obstacle to overcome. This is a computationally difficult task due to the vast number of genes with relatively few samples in gene expression data. To overcome these challenges, we presented an efficient dimensionality reduction for microarray gene expression. This paper aims to propose machine learning–based strategies for classifying acute myeloid leukemia and acute lymphoblastic leukemia based on microarray gene expression profiles. We employed logistic regression, extremely randomized trees classifier, ridge classifier, Ada boost classifier, linear discriminant analysis, random forest, gradient boosting, and k-neighbor classifier. The principal component analysis was used for dimensionality reduction. We employ two distinct cross-validation procedures in this study because they produce more accurate skill assessments than previous strategies. Six distinct categorization performance measures were used to evaluate these approaches. The findings gave significant classification accuracy of more than 99% using logistic regression with an eightfold cross-validation approach. The findings were compared to those obtained using state-of-the-art methods, and our results outperformed them in terms of accuracy and computational time required to develop the model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call