Abstract

The discovery of diagnostic or prognostic biomarkers is fundamental to optimize therapeutics for patients. By enhancing the interpretability of the prediction model, this work is aimed to optimize Leukemia diagnosis while retaining a high-performance evaluation in the identification of informative genes. For this purpose, we used an optimal parameterization of Kernel Logistic Regression method on Leukemia microarray gene expression data classification, applying metalearners to select attributes, reducing the data dimensionality before passing it to the classifier. Pearson correlation and chi-squared statistic were the attribute evaluators applied on metalearners, having information gain as single-attribute evaluator. The implemented models relied on 10-fold cross-validation. The metalearners approach identified 12 common genes, with highest average merit of 0.999. The practical work was developed using the public datamining software WEKA.

Highlights

  • The type of leukemia is determined by the stage of development of the cell when it becomes malignant or cancerous

  • We used an optimal parameterization of Kernel Logistic Regression method on Leukemia microarray gene expression data classification, applying metalearners to select attributes, reducing the data dimensionality before passing it to the classifier

  • The objective of this work was to identify an optimal subset of genes as best diagnostic markers for leukemia, inferred from the best results from performance evaluation in classification implementing Kernel Logistic Regression (KLR)

Read more

Summary

Introduction

The type of leukemia is determined by the stage of development of the cell when it becomes malignant or cancerous. Acute lymphoblastic leukemia (ALL) is the most common type of leukemia in childhood, targeting the lymphoid line of blood cells [1]. Acute myeloid leukemia (AML) affects the myeloid line of blood cells and is a fast-growing form of cancer of the blood and bone marrow. The occurrence of cancer or subtype cancer can be determined through the informative genes, considering pattern expressions and its correlation to cancer typology. For this purpose, statistical methods and machine learning techniques can be employed for feature selection and, in this way, prioritizing informative genes. KLR model is a statistical classifier [2] that generates a fit model by minimizing the negative log-likelihood with a quadratic penalty using the Broyden–Fletcher–Goldfard-Shanno (BFGS) optimization [3]

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.