Abstract
Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25–29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.
Highlights
SVs alter more nucleotides per genome than single-nucleotide variants (SNVs) and short insertion/deletion variants[1], surprisingly little is known about their mutational spectra on a global scale
Owing to the combination of these challenges, SV references are dwarfed by contemporary resources for short variants, such as the Exome Aggregation Consortium (ExAC) and its successor, the Genome Aggregation Database, which have jointly analysed more than 140,000 individuals[4,6]
Available resources such as ExAC and Genome Aggregation Database (gnomAD) have transformed many aspects of human genetics, including defining sets of genes constrained against damaging coding mutations[6] and providing frequency filters for variant interpretation[5]
Summary
The distribution of SVs across samples matched expectations based on human demographic history, with the top three components of genetic variance separating continental populations (Fig. 1d, Supplementary Fig. 13). As orthogonal support for these trends, we identified an inverse correlation between APS and SNV constraint across all functional categories of SVs, which was consistent with our observed depletion of rare, functional SVs in constrained genes (Extended Data Fig. 6f) These comparisons confirm that selection against most classes of gene-altering SVs mirrors patterns observed for short variants[18,30]. Among these was an example of localized chromosome shattering involving at least 49 breakpoints, yet resulting in largely balanced products, reminiscent of chromothripsis, in an adult with no known severe disease or DNA repair defect[13,14,22] (Fig. 6e, Extended Data Fig. 8) These analyses highlight the potential of gnomAD-SV and WGS-based SV methods to augment disease-association studies and clinical interpretation across a broad spectrum of variant classes and study designs. Using gnomAD-SV, we evaluated our ability to detect genomic disorders in WGS data by calculating CNV carrier frequencies for 49 genomic disorders across 10,047 unrelated samples with no known neuropsychiatric disease and found that CNV carrier frequencies in gnomAD-SV
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have