Introduction: Acute lymphoblastic leukemia (ALL) is the most common childhood malignancy with high clinical and biological heterogeneity. Despite advancements in treatment, relapsed patients still face significant mortality rates and long-term complications. Risk-adapted strategies have been implemented to optimize treatment intensity based on prognostic factors. Genomic techniques offer potential for improved risk stratification, but accurate grouping remains challenging. Epigenetic alterations, particularly DNA methylation, have shown promise in tumor classification and prognostication. In this study, we trained machine learning survival models based on DNA methylation signatures to refine risk grouping in pediatric ALL patients. Methods: Clinical annotation and DNA methylation data from pediatric ALL samples were retrieved from the Norlund et al. cohort (n = 763). Age, sex, risk group, and cytogenetic subtype were selected as clinical covariates. The cohort was randomly divided into training (80%) and test (20%) sets. Univariate Cox regression and variable importance analysis were performed to select CpG sites associated with relapse-free survival (RFS) and overall survival (OS). Random survival forest models were constructed using the training set, and their performance was evaluated on the test set using the concordance index (c-index), the time-dependent area under the ROC curve (AUC), and the continuous rank probability scores (CRPS). To assess the generalizability of the risk predictors, we performed external validation using two independent pediatric ALL datasets ( Busche et al., n = 42; and Krali et al., n = 384). Results: The relapse risk predictor (RRP) was constructed using random survival forests based on a signature of 16 CpG sites. The model achieved good predictive accuracy with c-indexes of 0.667 and 0.677 in the training and test sets, respectively. The addition of cytogenetic subtype or age at diagnosis did not significantly change the model's performance. Longitudinal assessment of the RRP revealed its superior performance compared to clinical risk grouping. Combining the RRP with clinical risk grouping improved prognostic accuracy, with a 20-month AUC over 80%. The mortality risk predictor (MRP) was constructed using a signature of 53 CpG sites. The model achieved strong predictive performance with c-indexes of 0.751 and 0.755 in the training and test sets, respectively. Similar to the RRP, the addition of cytogenetic subtype or age at diagnosis did not significantly impact the model's performance. Longitudinal assessment of the MRP demonstrated a higher performance rate than clinical risk grouping at all evaluated time points. Combining the MRP with clinical risk grouping yielded the highest prognostic accuracy. In the external validation of the MRP in the Krali dataset, the MRP score was strongly associated with OS: c-index 0.621, p-value 1.06 x 10 -4. The hazard ratio was 1.073 (95% confidence interval: 1.035-1.112) for each incremental increase in the risk score. On the contrary, a lower reproducibility of the RRP was observed (c-index 0.529), presumptively related to the addition of MRD-driven risk stratification in more recent protocols. Conclusions: Machine learning models built on DNA methylation signatures surpassed the traditional clinical risk grouping both for predicting relapse risk and mortality, while the combination of molecular and clinical factors provided the best prognostic accuracy. Further validation and implementation of these predictors could contribute to personalized risk-adapted treatment strategies for pediatric ALL patients.
Read full abstract