Abstract As the most common pediatric malignancy, B-cell acute lymphoblastic leukemia (B-ALL) has multiple distinct subtypes characterized by recurrent and sporadic somatic and germline genetic alterations like chromosomal alteration, transcription factor rearrangement or kinase inhibition. The treatment of B-ALL patients is personalized based on specific subtypes, as the treatment responses for different B-ALL subtypes may vary considerably. Identification of B-ALL subtypes can facilitate risk stratification and enable tailored therapeutic approaches. Existing methods for B-ALL subtyping primarily depend on immunophenotypic, cytogenetic and genomic analyses, which would be costly, complicated, and laborious in clinical practice applications. To overcome these challenges, we present RanBALL (an Ensemble Random Projection-Based Model for Identifying B-Cell Acute Lymphoblastic Leukemia Subtypes), an accurate and cost-effective model for B-ALL subtype identification based on transcriptomic profiling only. RanBALL leverages random projection (RP) to construct an ensemble of dimension-reduced multi-class classifiers for B-ALL subtyping. Specifically, the transcriptomic profiling features were projected onto low-dimensional spaces by random projection matrices whose elements conform to a distribution characterized by zero mean and unit variance. To ensure reliable and robust performance, we selected 20 subspace dimensions ranging from 600 to 2500, with intervals of 100. The transformed low dimensional data matrix was used for training an ensemble of multi-class support vector machine (SVM) classifiers, each corresponding to one of the RP matrices of various dimensions. The predicted probabilistic scores of each B-ALL subtype were integrated for determining the final decision. Results based on 10 times 10-fold cross validation tests for >1700 B-ALL patients demonstrated that the proposed model achieved an accuracy of 93.7%, indicating promising prediction capabilities of RanBALL for B-ALL subtyping. Furthermore, the 30% held-out tests suggested that the model was robust and consistent to maintain high confidence levels for accurate predictions. The high accuracies of RanBALL suggested that our model could effectively capture underlying patterns of transcriptomic profiling for accurate B-ALL subtype identification. To extend the impact of RanBALL, we have established a free and publicly available python package for RanBALL available at https://github.com/wan-mlab/RanBALL. We believe RanBALL will facilitate the discovery of B-ALL subtype-specific marker genes and therapeutic targets, and eventually have consequential positive impacts on downstream risk stratification and tailored treatment design. Citation Format: Lusheng Li, Hanyu Xiao, Joseph D. Khoury, Jieqiong Wang, Shibiao Wan. RanBALL: Identifying B-cell acute lymphoblastic leukemia subtypes based on an ensemble random projection model [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4907.
Read full abstract