Introduction: Acute Myeloid Leukemia (AML) is a blood malignancy that occurs as a result of genomic alterations acquired in hematopoietic stem cells (HSCs). Several studies have recognized the importance of these alterations, including chromosomal rearrangements and single nucleotide variations (SNVs), for the classification and risk stratification of patients. However, none of these studies have thoroughly explored the functional impact of these lesions, potentially leading to the oversight or misinterpretation of certain driver mutations. In this study, we focused on the splicing process, as we aim to comprehensively characterize cis-acting splicing-associated variants (SAVs) in a large cohort of AML patients. Methods: We obtained recurrent driver gene variants (n= 3,847) from Table S5 reported by Papaemmanuil et al. and selected unique SNVs (n= 915) for our analysis. Among them, 628 (69%) were missense variants, 232 (25%) were nonsense variants, and 55 (6%) were located within splice sites. To assess their potential impact on splicing, we employed three splicing predictor tools (MaxEntScan, regSNP-splicing, and SpliceAI) and considered variants with two favorable predictions for further functional studies. Firstly, we attempted to locate the SNVs within the exomes of the TCGA-LAML (n= 149) and the BeatAML (n=342) datasets available in the Genomic Data Commons repository. If an SNV was identified, we obtained the RNA sequencing bam file from the corresponding patient's sample and utilized the rest of the cohort as a control for statistical analysis (p<0.05 for significance). In cases where the SNV was not found in these datasets, we employed the pSPL3-based splicing reporter minigene assay to study the variant. Results: The predictors unanimously identified all 55 splice-site affecting variants as true positives and reclassified 20 missense and nonsense variants as likely SAVs. Functional validation experiments confirmed that 11 of these variants (55%) indeed had an impact on splicing. Analysis of RNA sequencing data revealed interesting findings in commonly observed AML hotspots. For instance, in cases of U2AF1 c.470A>G and IDH1 c.394C>A, novel donor splice sites were created at distances of one (p=0.04522) and 34 nucleotides from the mutations (p=0.01878), respectively. Whereas, the TP53 c.395A>G change was found to activate a cryptic acceptor splice site located XX nucleotides away (p=0). Among private AML variants, minigene assays showed that eight variants resulted in aberrant splicing compared to the wild-type. Three variants led to exon skipping due to the loss of canonical donor splice sites ( EZH2 c.363G>A, FLT3 c.2523C>A, and TET2 c.3500G>A), four variants caused the activation of a novel donor splice site ( PTPN11 c.1508G>T, WT1 c.1157C>A, STAG2 c.545T>G, and KDM6A c.1516C>T), and one variant resulted in both exon skipping and the creation of a donor splice site ( RAD21 c.688G>A). Consequently, the predicted impact on the protein varied, with FLT3 and TET2 variants potentially leading to an in-frame deletion of the ATP-binding site and TET2-oxygenase domain, respectively, with the EZH2 variant likely resulting in an expression switch from the large to small isoform. The remaining SNVs were anticipated to trigger Nonsense-Mediated Decay or shorten proteins. Conclusions: This study revealed that approximately 7% of SNVs found in a large group of AML patients have the potential to impact splicing. Notably, 1% of these SNVs had been classified as missense or nonsense changes. These findings provide support for the significant role of mis-splicing in the development of leukemia and emphasize the importance of devising novel approaches to effectively characterize driver mutations.
Read full abstract