Abstract

Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs.

Highlights

  • Studies of newly occurring Multi-nucleotide variants (MNVs) have been performed using trio data sets[2,14,15,16]; analysis of 283 trios with whole-genome sequence data[16] confirmed that MNV events occur much more frequently than expected by random chance

  • As part of the Deciphering Developmental Disorders (DDD) study[17], Kaplanis et al.[2] analyzed exome-sequence data from over 6000 trios to quantify the pathogenic impact of MNVs in developmental disorders, showing that such variants are substantially more likely to be deleterious than SNVs and further clarifying the mutational mechanisms that generate them

  • We analyzed 125,748 human exomes and 15,708 genomes and identified 1,792,248 MNVs across genome with constituent variants falling within 2 bp distance, including 31,575 that exist within a codon

Read more

Summary

Introduction

Studies of newly occurring (de novo) MNVs have been performed using trio data sets[2,14,15,16]; analysis of 283 trios with whole-genome sequence data[16] confirmed that MNV events occur much more frequently than expected by random chance. As part of the Deciphering Developmental Disorders (DDD) study[17], Kaplanis et al.[2] analyzed exome-sequence data from over 6000 trios to quantify the pathogenic impact of MNVs in developmental disorders, showing that such variants are substantially more likely to be deleterious than SNVs and further clarifying the mutational mechanisms that generate them. These analyses have provided estimates of the germline MNV rate per generation, falling into a consistent range of 1–3% of the SNV rate. To enhance our understanding of MNV mechanisms, we examine the distributions of MNVs stratified by more than ten different functional annotations across the human genome, as well as estimates of the genomewide per-base frequencies of the dominant mutational processes generating MNVs

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call