Abstract

Current somatic mutation callers are biased against repetitive regions, preventing the identification of potential driver alterations in these loci. We developed a mutation caller for repetitive regions, and applied it to study repetitive non protein-coding genes in more than 2200 whole-genome cases. We identified a recurrent mutation at position c.28 in the gene encoding the snRNA U2. This mutation is present in B-cell derived tumors, as well as in prostate and pancreatic cancer, suggesting U2 c.28 constitutes a driver candidate associated with worse prognosis. We showed that the GRCh37 reference genome is incomplete, lacking the U2 cluster in chromosome 17, preventing the identification of mutations in this gene. Furthermore, the 5′-flanking region of WDR74, previously described as frequently mutated in cancer, constitutes a functional copy of U2. These data reinforce the relevance of non-coding mutations in cancer, and highlight current challenges of cancer genomic research in characterizing mutations affecting repetitive genes.

Highlights

  • The development of Generation Sequencing (NGS) technologies has allowed the study of human variation at an unprecedented resolution

  • To explore the potential relationship between the ability to detect somatic mutations in tumor samples and the repetitive nature of different loci, we used the somatic mutations identified by the International Cancer Genome Consortium (ICGC) PanCancer Analysis of Whole Genomes (PCAWG) in 2658 tumor samples[2], and compared the density of mutations identified by different mutation callers at different mapping qualities (Fig. 1a)

  • We observed that the density of somatic mutations per Megabase detected diminished as the mapping quality was lower, with almost no mutations detected at loci with a mapping quality of 0, and a 90% reduction in the density of mutations at loci with mapping quality below 20

Read more

Summary

INTRODUCTION

The development of Generation Sequencing (NGS) technologies has allowed the study of human variation at an unprecedented resolution. In the case of cancer, the completion of different large cancer genomic studies, such as the International Cancer Genome Consortium (ICGC) or The Cancer Genome Atlas (TCGA), has transformed our knowledge of many of these pathologies with the identification of many driver mutations affecting genes previously unsuspected of being mutated in cancer[1–3]. The short read lengths produced by NGS technologies used in these projects hamper the study of mutations in repetitive regions of the genome In this regard, due to the high sequence identity between these loci, short reads derived from a repetitive region are ambiguously aligned to one of the repeats, not to the one they came from. The identification of highly recurrent mutations in genes located in repetitive regions suggests that these complex regions might contain additional driver mutations previously missed by current analytical pipelines

RESULTS
DISCUSSION
XM ðmÞ ðmÞ
CODE AVAILABILITY
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call