In 2022, both the World Health Organization (WHO) and the European LeukemiaNet (ELN) highlighted the formidable challenges of categorizing certain Acute Leukemia subtypes, such as AML with rare recurring translocations, complex karyotypes, not otherwise specified, and myelodysplastic-like entities. As characterizations of new variants and fusion partners become more prevalent, it is difficult to assess which ones, alone or in combination, generate which leukemia phenotypes. To that end, WHO posed methylation profiling as “the technology at present best suited for addressing lineage and thereby the cell population of origin of tumors” (PMID: 34921008), which invites an opportunity to evaluate the DNA methylome as a clinical diagnostic avenue in hematopoietic malignancies like AML. Thus, we hypothesize that the DNA methylome holds promise in further refining AML classification. In this study we describe the assembly of the largest publicly available methylation dataset of acute leukemias so far, which combines 11 high-quality clinical trials/studies: NOPHO ALL92-2000 (n=796), AAML0531 (n=628), AAML1031 (n=581), BeatAML (n=316), TCGA AML (n=194), French GRAALL 2003-2005 (n=153), TARGET ALL (n=131), CETLAM SMD-09 (n=83), AAML03P1 (n=72), Japanese AML05 (n=64), and CCG2961 (n=41), resulting in a total of 3,059 subjects after preprocessing. Samples were obtained either from bone marrow or peripheral blood, with DNA methylation (meDNA) data procured using the Illumina methylation array 450k or EPIC array, which share 452,453 probes with same chemistry and design. To independently validate the findings derived from the discovery cohort, we processed in parallel meDNA array data from bone marrow specimens at diagnosis of AML patients treated on the multi-center clinical trials AML02 (n=159) and AML08 (n=42) led by St. Jude Children's Research Hospital. Processing the raw data followed best practices from the literature using SeSAMe (PMID: 30085201, 27924034). To generate the atlas, we used a novel dimensionality reduction unsupervised learning algorithm called Pairwise Controlled Manifold Approximation (PaCMAP), which allowed compression of 319,738 processed CpG values into two components for visualization (Figure 1) and five components for downstream classification analysis. To empirically assess classification accuracy of PaCMAP results, we implemented a machine learning pipeline with hyperparameter tuning, 10-fold cross validation, and assessed accuracy per class in discovery and validation cohorts. Not all samples, however, had clinical diagnostic annotation available in the dataset, so only those with the annotations were used in the supervised machine learning model (n=1399 in discovery and n=110 in validation). Our resulting atlas unveils several clusters of samples that defined 6 hematopoietic lineages: AML, ALL, MDS-related or secondary myeloid neoplasms (MDS-like), Acute promyelocytic leukemia (APL), mixed phenotype leukemia, and otherwise-normal control. These are further subdivided into 30 subclasses overlapping with WHO 2022 and ELN 2022 clinical diagnostic annotations. Subsequent analyses comparing methylation-based subtype prediction to clinically annotated subtypes revealed an overall concordance score of 0.936, with per class 10-fold CV concordance ranging from 0.82 for MDS-like to 1.00 in APL and NUP214 fusions. Importantly, the validation (test) cohort showed accuracies per class of 0.91 for AML with KMT2A-r (n=47) and 1.00 for AML rare recurring translocations (n=10). These are subtypes with large genomic heterogeneity that may be better represented at the epigenomic level. Additionally, these are largely considered standard to high-risk groups with poor prognosis, which invites further studies aiming at uncovering the biological mechanisms behind the methylation patterns governing this classification. Finally, the resulting classifier allowed prediction of WHO/ELN clinical diagnosis for 91 samples in the validation cohort that were previously categorized as “Normal Karyotype”, “Other” or blank. In conclusion, our study effectively showcases the use of a methylome atlas in enhancing the diagnosis of AML subtypes.
Read full abstract