INTRODUCTION: Little is known about the pattern and function of mutations within the 98% of the genome which is non-coding (nc). Whole-genome sequencing (WGS) can identify the full range of single nucleotide variants (SNVs), insertions/deletions (InDels), copy-number variants (CNVs), and structural variants (SVs), which are critical to disease progression. Here, we characterize the non-coding genome to gain significant insight into the role of mutations in gene regulatory elements in the etiology of multiple myeloma (MM) and to models of how it develops. METHODS: We studied 302 of MM precursor and newly diagnosed MM (NDMM) patients with high-coverage WGS, where each SNV/InDel was confirmed by two or more algorithms. Results were validated on an independent cohort of 256 NDMM with 80X WGS data. A pipeline employing a consensus mechanism for determining the final set of somatic events was used, including Mutect2, Strelka2, and VarScan2 for SNVs; Mutect2, Strelka2, VarScan2, and SvABA for InDels; Battenberg and FACETS for CNVs; Manta, SvABA, DELLY2, and IgCaller for SVs (https://github.com/pblaney/mgp1000). The R package fishHook was used toidentify statistically significant enrichment of mutations. To identify nc-variants we partitioned the genome into 10 kb tiles that were iteratively shifted by 500 bp and tested each tile against a regression model built into fishHook. The model includes a series of covariates that inform replication timing, sequencing context, and chromatin states. RESULTS: We identified 2,039,841 SNVs and 492,746 InDels in total. The tumor mutational burden (TMB) varies between molecular subgroups with the t(4;14) being significantly higher at 3.23 (somatic mutations per Mb) in comparison to the t(11;14) at 2.57 (FDR adj. P=0.035), which was closer to patients without a subtype translocation at 2.78. For ncSNVs and ncInDels, we identified 4,374 and 272 tiles respectively with significant mutation enrichment genome-wide (FDR adj. P<0.05). As tiles may overlap, we collapsed contiguous segments into consensus regions assigning the nearest coding gene as an identifier and termed these “mutation-enriched regions” or MERs. We identified 282 MERs associated with 203 genes for ncSNVs and 26 MERs associated with 25 genes for ncInDels. The two types of regions overlap at six loci ( TENT5C, OR2T2, FOXD4L1, BCL6, BLOC1SS- TXNDC5, PLD5P1). Thus, we identified 302 MERs associated with 221 genes, with some of the most highly mutated MERs included BCL6 (76.2% of patients), BLOC1S5- TXNDC5 (28.1%), ZFP36L1 (22.2%), BTG2 (21.2%), IRF8 (16.2%), TENT5C (13.6%), and CCND1 (12.3%). In total 19,743 of the 2,532,587 mutations fall into MERs with 1.3-65.6% of patients having one of these mutations. We evaluated the MERs for functional relevance by intersecting the regions with a list of 8,357 genome-wide enhancer (E) and super-enhancer (SE) elements derived from germinal-center B cells (GCB), DLBCL (Bal et al. Nature 2022) and MM (Lovén et al. Cell 2013). In total, 17.9% (54/302) of the MERs were identified, involving 45 genes. These MERs intersected with 28 Es and 20 SEs, with a non-random distribution of mutations within them. Of the total MER mutations, 21.9% (4,317/19,743) fell into some form of enhancer element. Breaking these down further 41.6% (1,798/4,317) are in Es, and 58.4% (2,519/4,317) are in SEs. All the E mutations were MM specific; of the SE mutations, 6.0% (18/302) of patients had a mutation in an ABC-DLBCL specific SE, 16.6% (50/302) in a GCB specific SE and 64.2% (194/302) in a MM specific SE. We examined the distribution of mutations within the SE regions and found they are non-random suggesting a selective mechanism. We intersected the SE regions with SV and found an excess at TENT5C, BTG2, BLOC1S5-TXNDC5 and ZFP36L1. A focused analysis of chr1p, chr1q, chr6q and chr14 revealed the importance of mutationally induced breaks within the SE and its translocation to a receptor site often 8q the site of MYC. CONCLUSIONS: We provide evidence for an important contribution of mutations within E and SE regions to the etiology of MM. This may involve either direct selection of mutations within the GC or by the re-entry of a memory B-cell carrying a pattern of mutations it acquired in a pre-MM phase, which then acquires a MM-specific driver. FIGURE: Distribution of mutations across MM genomes. A) Tumor mutational burden across MM subtypes. B) Q-Q plots of fishHook model for SNVs
Read full abstract