Abstract

Article Figures and data Abstract Editor's evaluation eLife digest Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract DNA repair deficiencies in cancers may result in characteristic mutational patterns, as exemplified by deficiency of BRCA1/2 and efficacy prediction for PARP inhibitors. We trained and evaluated predictive models for loss-of-function (LOF) of 145 individual DNA damage response genes based on genome-wide mutational patterns, including structural variants, indels, and base-substitution signatures. We identified 24 genes whose deficiency could be predicted with good accuracy, including expected mutational patterns for BRCA1/2, MSH3/6, TP53, and CDK12 LOF variants. CDK12 is associated with tandem duplications, and we here demonstrate that this association can accurately predict gene deficiency in prostate cancers (area under the receiver operator characteristic curve = 0.97). Our novel associations include mono- or biallelic LOF variants of ATRX, IDH1, HERC2, CDKN2A, PTEN, and SMARCA4, and our systematic approach yielded a catalogue of predictive models, which may provide targets for further research and development of treatment, and potentially help guide therapy. Editor's evaluation This is a well-motivated study looking at the association of DNA repair deficiencies with mutational patterns. This study is of interest to the cancer genomics community and highlights how the understanding of DNA repair processes can be used in the development of novel cancer therapy, and will also be of interest to researchers in the field of genomic medicine and cancer mutagenesis. It presents predictive models with potential clinical applications that can identify patients with specific gene dysfunction based on characteristic patterns of mutation. The key findings are well supported. https://doi.org/10.7554/eLife.81224.sa0 Decision letter Reviews on Sciety eLife's review process eLife digest Many different aspects of the environment – such as ultraviolet radiation, carcinogens in food and drink, and the ageing process itself – damage the DNA in human cells. Normally, cells can repair these sites by activating a mechanism known as the DNA damage response. However, the hundreds of genes that orchestrate this response are also themselves often lost or damaged, allowing the unrepaired sites to turn into permanent mutations that accumulate across the genome of the cancer cell. By studying the DNA of cancer cells, it has been possible to identify characteristic patterns of mutations, called mutational signatures, that appear in different types of cancer. One specific pattern has been linked to the loss of either the BRCA1 or BRCA2 gene, both of which are part of the DNA damage response. However, it remained unclear how many other genes involved in the DNA damage response also lead to detectable mutational signatures when lost. To investigate, Sørensen et al. computationally analysed data from over six thousand cancer patients. They looked for associations between over 700 DNA damage response genes and 80 different mutational signatures. As expected, the analysis revealed a strong connection between the loss of BRCA1/BRCA2 and their known mutational signature. However, it also found 23 other associations between DNA damage response genes that had been lost or damaged and particular patterns of mutations in a variety of cancers. These findings suggest that mutational signatures could be used more widely to predict which DNA damage response genes are no longer functioning in the genome of cancer cells. The mutational signature caused by the loss of BRAC1/BRAC2 has been shown to make patients more responsive to a certain type of chemotherapy. Further experiments are needed to determine whether the connections identified by Sørensen et al. could also provide information on which treatment would benefit a cancer patient the most. In the future, this might help medical practitioners provide more personalized treatment. Introduction The DNA damage response (DDR) and repair pathways are central to the genetic integrity of cells, and deficiencies may cause mutational patterns genome-wide (Lindahl, 1993; Nik-Zainal et al., 2012; Volkova et al., 2020). Some DNA repair deficiencies are known to modulate the response to therapies: BRCA1/2 deficiency renders cancers susceptible to treatment with PARP inhibitors (Bryant et al., 2007), mismatch repair (MMR)-deficient cancers are sensitive to checkpoint inhibitors (Le et al., 2015) but resistant to alkylating agents such as temozolamide (von Bueren et al., 2012), and CDK12-mutated cancers have a suggested sensitivity to CHK1 inhibitors (Paculová et al., 2017). Because of this, efforts have been made to annotate inactivating mutations in DDR genes (Landrum et al., 2014). However, the approach is limited by the lack of functional impact annotation of most variants, which are generally denoted as ‘variants of unknown significance’ (VUS). Moreover, loss of gene activity could also occur by other means, such as transcriptional silencing. A complementary approach is to investigate whether DNA repair deficiencies can be identified by DNA mutational patterns, also referred to as ‘mutational scars’. This approach has been pioneered for homologous recombination deficiency (HRD) caused by BRCA1/2 deficiencies (-d), which can be successfully predicted by measuring the accumulation of small deletions with neighbouring microhomologous sequences (Nik-Zainal et al., 2012; Davies et al., 2017; Nguyen et al., 2020), such as done by the HRDetect algorithm by Davies et al., 2017. The association with microhomologous deletions is due to the use of microhomology-mediated endjoining to repair double-strand breaks in homologous recombination deficient tumours (McVey and Lee, 2008; Nussenzweig and Nussenzweig, 2007). Likewise, MMR deficiency causes an elevated rate of mono- and dinucleotide repeat indels genome-wide, a genetic phenotype denoted microsatellite instability (MSI; Umar et al., 1994; Edelmann et al., 2000). Mutations in other DNA repair genes have also been associated with mutational patterns, including the tumour suppressor gene TP53, which is associated with increased structural rearrangements and whole-genome duplications (Lanni and Jacks, 1998; Gorgoulis et al., 2005) and CDK12 which is associated with a genome-wide phenotype of large tandem duplications (Popova et al., 2016; Menghi et al., 2018). The scope of this approach can now be evaluated systematically across DDR genes by exploiting available whole cancer genomes from thousands of patients (ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium, 2020; Priestley et al., 2019). To achieve this, mutations observed genome-wide may be condensed into mutational summary statistics for predictive modelling, including statistics based on single base subsitutions (SBSs), indels, and different types of structural variants (SVs). The SBSs are statistically assigned to so-called SBS signatures that are catalogued and enumerated within the COSMIC database (Tate et al., 2019). Some of these are associated with specific DNA repair deficiencies as well as genotoxic exposures, such as ultraviolet (UV) light and smoking. Each SBS signature captures the relative frequency of the different mutation types and their flanking nucleotides (Alexandrov and Stratton, 2014). Here, we performed a systematic screen for DDR gene deficiencies that can be predicted through their association with genome-wide mutational patterns. We developed a generic approach to train predictive statistical models that identify associations with individual mutational summary statistics that capture the mutational patterns, including SBS signatures, indels, and large SVs. We applied it to 736 DDR gene deficiencies, considering both mono- and biallelic loss-of-function (LOF), identified across 32 cancer types, in a combined set of whole cancer genomes from 6065 patients (ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium, 2020Priestley et al., 2019). The underlying aim was to identify novel associations with potential biological relevance and to evaluate whether DDR deficiencies can be predicted with sufficiently high certainty to have a potential for clinical application. Our analysis revealed 24 DDR genes where deficiencies are associated with specific mutational summary statistics in individual cancer types across 48 predictive models. These results recapitulated the expected associations between mutational patterns and deficiencies of BRCA1/2, TP53, MSH3/6, and CDK12. We supplemented this knowledge by providing a predictive model of CDK12 deficiency that achieved high accuracy (area under the receiver operator characteristic [AUROC] = 0.97) in prostate cancer. Furthermore, we present unexpected predictive models of several DDR deficiencies; ATRX and IDH1 deficiency in cancers of the central nervous systems (CNSs), HERC2 and CDKN2A deficiency in skin, PTEN deficiency in cancers of the CNS and uterus, and SMARCA4 deficiency in cancers of unknown primary. Results DDR gene deficiencies across 6065 whole cancer genomes We compiled and analysed 2568 whole-genome sequences (WGS) from The Pan-Cancer Analysis of Whole Genomes (PCAWG) (ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium, 2020) and 3497 WGS from the Hartwig Medical Foundation (HMF) (Priestley et al., 2019). In total, we investigated 6065 whole cancer genomes of 32 cancer types (Figure 1a; Supplementary file 1a). Figure 1 with 1 supplement see all Download asset Open asset Cancer types, DNA damage response (DDR) gene deficiencies, and mutational patterns. (a) Cohort sizes for the 32 cancer types comprising the 6065 whole cancer genomes collected from the Hartwig Medical Foundation (HMF; n = 3497) and the PanCancer Analysis of Whole Genomes (PCAWG; n = 2568). (b) Mono- and biallelic loss-of-function (LOF) events were annotated across 736 DDR genes based on both pathogenic variants and copy number losses (loss of heterozygosity; LOH), overall and (c) per patient (d) with varying no. of LOF events per DDR gene (x-axis; logarithmic). (e) Whole-genome mutational patterns were represented as summary statistics and used as input features for the predictive models of DDR gene deficiency. Concretely, each patient was annotated with the number of single-base substitutions (SBSs) that are accounted to each SBS signature (Alexandrov and Stratton, 2014; Degasperi et al., 2020), number of indels divided by context (mh = microhomology; rep = repetitive), and (f) number of structural variants divided by clusterness, size, and type (del = deletion; inv = inversion; tds = tandem duplication; trans = translocation). For each genome, we evaluated 736 known DDR genes for both germline and somatic LOF events (Pearl et al., 2015; Knijnenburg et al., 2018; Olivieri et al., 2020). We annotated both mono- and biallelic LOF events, where each event could be either a single-nucleotide variant, an indel, or a loss-of-heterozygosity (LOH) (Figure 1b, c; Supplementary file 1b). Pathogenicity of SBSs and indels was evaluated using a combination of CADD scores (>25; 0.3% most pathogenic variants) (Rentzsch et al., 2019) and ClinVar annotation, when available (Methods). We inferred a total of 8408 biallelic DDR gene deficiencies, primarily through a combination of somatic or germline variants (SBSs and indels) with pathogenic potential (n = 1702), or LOH events combined with a single pathogenic germline (n = 3562) or somatic (n = 3078) variant (SBS or indel; Figure 1b). On average we observed a single, biallelic DDR gene loss per patient, with some tumours showing extreme rates of somatic pathogenic mutations (Figure 1c; Figure 1—figure supplement 1). As expected, TP53 deficiency (TP53-d) was the most frequent LOF event (81 biallelic and 1746 monoallelic events; 29% of tumours affected; Supplementary file 1b; Figure 1d), while 70% of DDR genes had biallelic deficiency in less than 10 tumours across all cancer types (511/736; Figure 1d). Among monoallelic events, we identified 15,063 pathogenic germline (59%) and 10,336 somatic (41%) events. Whole-genome mutational patterns We collected mutational summary statistics for each cancer genome, which were used as features for the downstream predictive models (Figure 1e, f; Supplementary file 1c, d). For SBSs, we evaluated exposure towards predefined sets of cohort-specific SBS signatures (Alexandrov and Stratton, 2014; Degasperi et al., 2020). Short indels and SVs were simply categorised and counted: Deletions were sub-categorised based on surrounding sequence repetitiveness and presence of microhomology. SVs were sub-categorised by type (tandem duplications, inversions, deletions, and translocations), five size ranges (not relevant for translocations), and cluster presence (Methods). Several SBS signatures as well as some types of indels have suggested aetiologies (collected in Supplementary file 1e). Statistical modelling of DDR gene deficiencies For the downstream statistical analysis, we restricted our focus to DDR genes in cancer types with more than five biallelic LOF events in either PCAWG or HMF (n = 194) or more than 10 monoallelic LOF events (n = 341) (Supplementary file 1f). Using BRCA2-d in the set of HMF breast cancer tumours (n = 645) as an example, we observed biallelic LOF events in 17 (2.6%; 14 germline, 3 somatic) tumours and monoallelic LOF events in 7 (1.1%; 4 germline, 3 somatic) tumours (Supplementary file 1g). We further observed VUS events in 53 tumours (8.2%; 42 germline, 11 somatic), which were excluded from the analysis. The remaining BRCA2 wild-type (WT) tumours (n = 568; 88.1%) were used as a background set for training the predictive models (Figure 2a). The high fraction of germline pathogenic variants diminishes the probability of a reverse-causal relationship between the loss of BRCA2 and the associated mutation patterns. Figure 2 Download asset Open asset Predictive modelling of BRCA2 deficiencies in the Hartwig Medical Foundation (HMF) breast cancers. (a) Mutational status of BRCA2 across 645 HMF breast cancer patients. (b) Mutational summary statistics for the HMF breast cancer patients divided by biallelic BRCA2 loss-of-function (LOF; red) and BRCA2 wild-type (WT; grey) (selected predictive features in bold). (c) Predictive features and their coefficients for model of biallelic BRCA2 loss with predictive performance measured in (d) area under the receiver operator characteristic (AUROC) and (e) precision-recall area-under-the-curve (PR-AUC) (PR-AUC-E = PR-AUC − baseline = 0.29; Methods). (f) Distributions of AUROC and (g) PR-AUC-E values obtained from 30,000 random data permutations compared to observed values (punctuated lines). (h) Correlation between selected predictive features (horizontal) and other highly correlated (Pearson corr. >0.65) mutational features (vertical). For each of the 535 groups of tumours we trained a least absolute shrinkage and selection operator (LASSO) regression model and evaluated the ability to discriminate between deficient and WT tumours (Methods). For BRCA2-d, we observed a strong association with the number of deletions at sites of microhomology (Figure 2b), with a median of 608 deletions per patient in BRCA2-d breast cancers versus 81 in BRCA2 WT breast cancers, in agreement with prior findings (Nik-Zainal et al., 2012; Davies et al., 2017; Nguyen et al., 2020). The LASSO regression also included non-clustered inversions 10–100 kb and clustered tandem duplications 1–10 kb, although both show high variance among tumours for both deficient and WT (Figure 2b) and have considerably smaller coefficients, ultimately contributing little influence on overall predictive performance (Figure 2c). Notably, some models include features with negative coefficients. The biological interpretation would be that tumours with a certain gene deficiency have fewer mutations attributed to a particular mutation pattern. Negative features were excluded in the development of the HRDetect algorithm (Davies et al., 2017), but we include them as we cannot rule out the possibility that a DDR deficiency protects from specific types of mutagenesis. Though not distinguishable in this study, we suggest that negative coefficient features may derive in three ways: First, they may stem from enhanced repair; second, they may stem from the decomposition of mutation counts into signatures; and third, the mutated tumours may represent a subclass of patients in terms of age, gender, or tumour subtype with specific mutational patterns. Evaluating model performance For each model, we evaluated the predictive performance using the area-under-the-receiver-operating-curve (AUROC) score as well as the precision-recall area-under-the-curve (PR-AUC) score. The PR-AUC score is a more robust measure for unbalanced data sets (Davis and Goadrich, 2006); however, the expected value for non-informative (unskilled) models equals the fraction of true positives and thus varies between models. Therefore, we used the PR-AUC enrichment over the true-positive rate (PR-AUC-E) as our selection criteria for predictive models. Shortlisting models For the downstream analysis, we included (shortlisted) models with PR-AUC-E that was substantial (>0.2; more than two standard deviations above the mean across all 535 models) and significant (Benjamin–Hochberg false discovery rate, FDR <0.05; Monte Carlo simulations) (Figure 3; Supplementary file 1h). Figure 3 with 3 supplements see all Download asset Open asset Predictive models of DNA damage response (DDR) gene deficiencies. (a) The precision-recall AUC enrichment PR-AUC-E; x-axis and significance (false discovery rate [FDR]; logarithmic y-axis) of the 535 predictive models (one model per gene with more than 5 biallelic or more than 10 tumours either mono- or biallelic mutated in either Hartwig Medical Foundation (HMF) or The Pan-Cancer Analysis of Whole Genomes (PCAWG) in any one cancer type; Methods). Significance (q-value representing FDR) evaluated by counting equally or more-extreme PR-AUC-E values across >10,000 permuted data sets and applying Benjamini–Hochberg FDR control. Models with FDR below 0.05 and PR-AUC-E above 0.2 are shortlisted (Methods). (b) Shortlisted predictive models of deficiency of BRCA1 or BRCA2; (c) TP53 monoallelic predictive models; (d) monoallelic gene deficiency models across colorectal cancer patients; and (e) remaining gene deficiency models not contained in the other sub-groups. Numbers indicate the number of mutated out of the total number of tumours included in the development of each model. Figure 4 Download asset Open asset Predictive models with anticipated aetiology or origin. (a) Overview of predictive models for BRCA1-d and BRCA2-d, showing data source, type of model, and loss-of-function (LOF)-set statistics. (b) PR-AUC-E, (c) area under the receiver operator characteristic (AUROC), and (d) the predictive features and their coefficient for individual models. (e–g) Overview of predictive models of TP53-d (as in a–c). (h) For each cohort, the number of structural variants (x-axis; logarithmic) for TP53 LOF tumours (red) versus TP53 wild-type tumours (grey) and (i) the significance of their difference (two-sided Wilcoxon rank-sum test). (j–l) Predictive models of gene deficiencies in colorectal cancers (as in a–c). (m) Number of deletions in repetitive DNA (as in h) and (n) its significance (as in i). (o) The predictive features of each model (as in d) and (p) the percentage of tumours that are co-mutated with MSH3. Testing models in the opposite data set Additionally, we calculated the PR-AUC-E of each model when applied to the same cohort in the opposite data set (Figure 3—figure supplement 1). Due to the difference in biology between the two sets, and low numbers of LOF mutated samples, we did not use this as model performance criteria but have included the PR-AUC-E values and p values from the tests (Supplementary file 1h). We identified significant predictive power, across both metastatic and primary cancers, for deficiency models of BRCA1/2, TP53, CDK12, PTEN, ARID1A, and IDH1A. Each case is described in the respective part of the results. BRCA example In the example of BRCA2-d in breast cancers of the HMF data set, our model achieved an AUROC of 0.93 and a PR-AUC-E of 0.29 (Figure 2d, e; Supplementary file 1h). Out of 30,000 permuted LOF-sets, none had a similar or higher PR-AUC score and we considered the model significant with a p-value <3 × 10−5 (FDR adjusted q-value <6 × 10−4) (Figure 2f, g). The model achieved a PR-AUC-E of 0.37 when tested on the PCAWG data set, suggesting that the model may generalise across both metastatic and non-metastatic tumours. This was further supported by the independent discovery of a similar model in the PCAWG data set, which had an almost similar predictive power in the HMF data (PR-AUC-E = 0.19; Supplementary file 1h). Notably, the BRCA2-d model did not include non-clustered deletions <100 kb, SBS signature 3, and SBS signature 8, all features which have been associated with BRCAness (Davies et al., 2017). However, SBS signature 3 and non-clustered deletions 1–10 kb are included in the model when the deletions at sites of microhomology are omitted from the input data set, suggesting that they are excluded during feature selection due to high positive correlation with the number of deletions at sites of microhomology among HMF breast cancers (Pearson corr. >0.7; Figure 2h; Supplementary file 1i). Our selection criteria resulted in 48 shortlisted predictive models across 24 DDR genes (Figure 3a; Supplementary file 1h). As exemplified for BRCA2, each model is specified by a set of predictive features representing mutational patterns associated with DDR gene LOF. We divided the models into four groups based on aetiology and origin: models of BRCA1/2-d (eight models of BRCA2-d and a single model of BRCA1-d; Figure 3b); models of monoallelic TP53-d (11 models; Figure 3c); models of various monoallelic gene deficiencies derived from colorectal cancer patients (eight models; Figure 3d); and models including other DDR genes and cancer types, including previously undescribed associations (20 models; Figure 3e). Survival analysis For each of the shortlisted models, we evaluated the difference in overall survival between samples carrying LOF mutations and those that did not. We observed nominally significant differences (p < 0.05; univariate Cox regression analysis) in survival for BRCA2 and TP53 in multiple cancer types as well as for UVRAG in colorectal cancer (Figure 3—figure supplements 2 and 3; Supplementary file 1j). The association of TP53 monoallelic LOF with decreased survival is in line with expectations (Malcikova et al., 2009). Interestingly, several models of BRCA1/2 LOF mutations associated with improved survival, including BRCA1 LOF mutations in metastatic ovary cancers (hazard-ratio <0.42; p < 0.093) and BRCA2 LOF mutations in non-metastatic ovary cancers (hazard-ratio <0.24; p < 0.017). In contrast, BRCA2 LOF mutations in primary breast cancers were associated with decreased survival (hazard-ratio >9.30; p < 0.004) (Figure 3—figure supplements 2 and 3). This may potentially reflect differences in both molecular diagnostic practices and treatment regiments across these cancer types. For instance, platin-based treatment irrespective of BRCA1/2 status has been standard for groups of the ovarian and pancreatic cancer patients, while traditionally not for the breast cancer patients (Gennari et al., 2021; Colombo et al., 2019). The sensitising effect of BRCA1/2 deficiency might thus explain the associated survival differences among cancer types (Kennedy et al., 2004). For most models the differences in survival were insignificant, though this may be related to the generally small set of LOF mutated samples. Recapitulation and predictive modelling of expected associations with BRCA1/2 deficiency Five models predicted biallelic loss of BRCA2 in cancers of the ovary, prostate, pancreas, and breast. In addition, three models predicted BRCA2 monoallelic loss in cancers of the pancreas, breast, and prostate. Finally, we derived a single model of biallelic BRCA1 loss in ovarian cancer (Figures 3c and 4a). All models significantly outperformed their Monte Carlo simulations (q < 0.05; Benjamini–Hochberg FDR control) and had PR-AUC-E above 0.2 (Figure 4b, c). All BRCA2-d models were predominantly predicted by deletions at sites of microhomology, consistent with the role of BRCA2 in homologous recombination and suppression of microhomology-mediated endjoining (Ceccaldi et al., 2015). Both clustered and non-clustered tandem duplications in the range of 1–100 kb were included as features for various models, though with much smaller predictive power. This agrees with what was identified for BRCA2-deficient tumours in prior studies (Davies et al., 2017; Nguyen et al., 2020). The biallelic breast cancer model based on PCAWG further included SBS signature 3 as a predictive feature (Figure 4d). Contrasting to the models of BRCA2-d, BRCA1-d in ovarian cancer was exclusively associated with clustered and non-clustered tandem duplications (1–10 kb; Figure 4d). This aligns with prior studies (Davies et al., 2017; Nguyen et al., 2020), which also found BRCA1-d to be closely associated with a tandem-duplicator phenotype. In general, BRCA1 and BRCA2 were subject to predominantly germline pathogenic events, and not a single deletion at a site of microhomology, suggesting the expected forward causality (Supplementary file 1g). As for the loss of BRCA2, the model for loss of BRCA1 loss in ovary had sufficient predictive power (PR-AUC-E = 0.3) in the other data set, suggesting that the model works independently of the metastatic capacity of the tumour (Supplementary file 1h). TP53 deficiencies associate with increased numbers of SVs We detected 11 predictive models (four based on PCAWG and seven on HMF) of monoallelic TP53-d across cancers of the breast, skin, ovary, uterus, neuro-endocrine tissues, biliary gland, head and neck, pancreas, and the CNS (Figure 4e). These predictive models performed with PR-AUC-E values ranging from 0.21 to 0.48 in breast and biliary gland cancers, respectively. Similarly, AUROC values ranged from 0.48 to 0.88, again in breast and biliary gland cancers (Figure 4f, g). In line with existing literature (Hanel and Moll, 2012), TP53-d is associated with a significantly increased number of SVs across the genome (Wilcoxon rank-sum test; Figure 4h, i), except in skin cancers. The models of TP53 loss in head and neck, skin, breast, and the biliary gland performed well (PR-AUC-E above 0.2) in the other data set, suggesting that the predictive performance generalises independent of metastatic tumour state (Figure 3—figure supplement 1; Supplementary file 1h). Colorectal cancer models derived from hypermutated MMR-deficient tumours In the HMF colorectal cancers, we discovered eight predictive gene deficiency models (MSH3, SMC2, SMC6, BMPR2, CLASP2, SRCAP, UBR5, and UVRAG) of monoallelic LOF with PR-AUC-E ranging from 0.21 (CLASP2) to 0.62 (MSH3) (AUROC ranging from 0.68 for SRCAP deficiency to 0.94 for MSH3 deficiency; Figure 4j–l). We suggest that the high number of models of monoallelic deficiencies may arise from spurious LOF events in DDR genes in a subset of colorectal cancers that are hypermutated. In line with this, the hypermutated samples (n = 18; >100,000 mutations) harbour 22% (5.9-fold enrichment) of all the DDR LOF events across the HMF colorectal cancer samples (n = 475). Some colorectal cancers are signified by MMR deficiencies, such as LOF of MSH3 or MSH6, creating a high number of deletions in repetitive DNA (Umar et al., 1994; Edelmann et al., 2000). Indeed, we found that this pattern was most profound among the MSH3-mutated cancers (Figure 4m, n). Furthermore, we found co-mutation with MSH3 across the tumours underlying each model, ranging from 20% (SRCAP) to 33% (BMPR2) of the mutated tumours (Figure 4p). This suggests that the models (except for the model of MSH3-d) might be the consequence of the hypermutator phenotype. In other words, the causality may be reversed in these cases, and the mutational process driven by MSH3-d may have caused the majority of their LOF events. This notion is supported by investigating the features of the models. All eight models are characterised by a single, primary predictive feature: Insertions (SMC2, SMC6, BMPR2, UBR5, and UVRAG), deletions in repetitive DNA (MSH3 and SRCAP), or deletions not flanked by repetitive or microhomologous DNA (CLASP2) (Figure 4o). Each of these features has a high positive correlation (Pearson corr. >0.93) with the number of deletions in repetitive DNA. This correlation suggests that all eight models relate to a genome-instability phenotype, which may be driven by the MSH3 co-mutated tumours or, potentially, a concurrent deficiency of other genes within the MMR system (Supplementary file 1i). Notably, the deficiency models of UBR5, BMPR2, CLASP2, and SMC6 all had PR-AUC-E above 0.2 in the other data set, suggesting that these genes are associated with the MMR phenotype regardless of metastatic state (Supplementary file 1h). Biallelic LOF of MSH6 associated with increased number of deletions in repetitive DNA in prostate cancer MSH6, a gene implicated in MMR and microsatellite stability (Edelmann et al., 2000), was mutated in both alleles in 7 out of 342 HMF prostate cancer patients. We observed pathogenic indels in MSH6 in all seven tumours, but only one of the

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call