Abstract

Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals.

Highlights

  • Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes

  • The 769 Genome of the Netherlands (GoNL) individuals originate from parentoffspring families (231 trios and 19 families in which twin pairs are included in the offspring generation), yielding family-based high-quality haplotypes across substantially longer ranges in comparison to statistically phased unrelated individuals[24,25]

  • To show the specificity of our structural variant predictions, we selected a representative set of candidates across all 9 variant types and performed an independent experimental validation using PCR-amplification across the variant breakpoints followed by Sanger or Illumina MiSeq sequencing (Supplementary Data 1)

Read more

Summary

Introduction

Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. The majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. We analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We focus on discovery, genotyping and phasing the full spectrum of structural variants to generate a high-quality SV-integrated, haplotype-resolved reference panel by exploiting two key features of the GoNL project design. In addition to create a haplotype resolved panel, we report several currently under reported variant types, such as deletions 21–100 bp in size, complex indels, inversions, mobile element insertions (MEIs), large replacements and insertions of new genomic sequence[26]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.