The diverse molecular pathogenesis of MDS underpins its pathomorphologic and clinical heterogeneity, ultimately converging in a stepwise progression to leukemia. The identification of hallmarks morphological features, hampered by inter-observer variability, limited a precise resolution of pathogenic differences, partially mitigated by the inclusion of cytogenetic abnormalities, the two pillars of MDS classification. The advent of NGS led to the discovery of a multitude of mutations, and their combinations unveiled a tremendous molecular diversity in MDS. Intuitively, molecular patterning has been used in supervised fashion to fit morphologic classification, but this approach ensured only a limited success. Machine learning (ML) analytic methods, as demonstrated in acute myeloid leukemia (AML), provide new opportunities to integrate molecular data and identify functionally convergent patterns manifesting with similar clinical outcomes. Particularly, ML approaches can identify cryptic molecular interactions and pathogenic mechanisms irrespective of their often comparable impact on outcomes. Here, we took advantage of a large, well-annotated multicenter cohort of MDS and secondary AML (sAML) cases (N=3588) to test the hypothesis that related molecular patterns can be analyzed in an unbiased/unsupervised fashion to characterize molecularly-defined configurations. Our aim was not meant to compete with or replace current prognostication tools (including the new molecular prognostic score)2 but rather to illuminate the genetic sub-classification of MDS and related conditions in an operator-independent fashion, according to molecular correlations and mutual functional proximity, irrespective of their impact on clinical outcomes. Molecular signatures driving each cluster were identified and used for genomic subclassification that separated functionally related subtypes in an outcome independent fashion. We then validated the model on 412 patients from two external institutions. Among the 3588 patients included in this cohort, 735 (20%) had sAML, 774 (22%) had higher-risk MDS (HR-MDS), and 2079 (58%) had lower-risk MDS (LR-MDS). Abnormal karyotype was found in 43% of the cases, and 77% had at least one somatic mutation, with 8% harboring >4 mutations. ML analysis identified 14 distinctive molecular clusters (MCs) with unique pathobiological associations, treatment responses, and prognosis (Fig.1A). Normal karyotype (NK) was enriched in MC2, MC4, MC6, MC9, MC10, and MC12 with different distributions of ASXL1, DNMT3A,JAK2, RAS, SF3B1, and TET2 mutations. Complex karyotype and trisomy 8 were enriched in MC13 and MC1, respectively. Remarkably, our ML approach identified functionally relevant genomic clusters (Fig.1A). For instance, MC2, MC6, and MC8 demonstrated distinct genomic signatures: MC2 included cases with NK only (100%) and some DNMT3A (11%), JAK2 (11%), and RAS pathway (10%) mutations; MC6 cases had similar features to MC2 but were also enriched in SRSF2 (49%) and RAS mutations (23%); MC8 was instead characterized by the presence of del5q/-5 (100%), DNMT3A (17%), and TP53 (17%) mutations. MC13 contained cases with complex karyotype (100%) and TP53 (44%) mutations (Fig1.B). We then identified five risk groups to reflect the biological differences between clusters. Our clustering model highlighted the significant survival differences among patients assigned within similar IPSS-R risk groups but displaying heterogeneous molecular configurations. Different response rates to hypomethylating agents (e.g., MC9 and MC13 [OR: 2.2 and 0.6, respectively]) paralleled the biological differences across the clusters. Interestingly, our MCs retained significant survival differences regardless of the bone marrow blast percentage. Our model is available via web-based access resource (https://drmz.shinyapps.io/mds_latent). Despite the complexity and the diversity of molecular alterations in MDS/sAML, by deploying ML we were able to identify and distinguish functional and pathologically related MCs irrespective of the anamnestic clinico-morphological features. In recognition that similar outcomes do not warrant similar molecular features, our model provides genomic signatures for a better understanding of the pathobiological mechanisms of disease, progression to higher stages, and identification of future therapeutic vulnerabilities. Figure 1View largeDownload PPTFigure 1View largeDownload PPT Close modal
Read full abstract