Abstract

SummaryStructural variants contribute substantially to genetic diversity and are important evolutionarily and medically, but they are still understudied. Here we present a comprehensive analysis of structural variation in the Human Genome Diversity panel, a high-coverage dataset of 911 samples from 54 diverse worldwide populations. We identify, in total, 126,018 variants, 78% of which were not identified in previous global sequencing projects. Some reach high frequency and are private to continental groups or even individual populations, including regionally restricted runaway duplications and putatively introgressed variants from archaic hominins. By de novo assembly of 25 genomes using linked-read sequencing, we discover 1,643 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference. Our results illustrate the limitation of a single human reference and the need for high-quality genomes from diverse populations to fully discover and understand human genetic variation.

Highlights

  • Despite the progress in sampling many populations, human genomics research is still not fully reflective of the diversity found globally (Sirugo et al, 2019)

  • We present the structural variation analysis of the Human Genome Diversity Project (HGDP)-Centre d’Etude du Polymorphism CEPH panel (Figure 1A), a dataset composed of 911 samples from 54 populations of linguistic, anthropological, and evolutionary interest (Cann et al, 2002)

  • We generate a comprehensive resource of structural variants from these diverse and understudied populations, explore the structure of different classes of structural variation, characterize regional and population-specific variants and expansions, discover putatively introgressed variants, and identify sequences missing from the GRCh38 reference

Read more

Summary

Introduction

Despite the progress in sampling many populations, human genomics research is still not fully reflective of the diversity found globally (Sirugo et al, 2019). Whole-genome sequencing projects have provided unprecedented insights into the evolutionary history of our species; they have mostly concentrated on substitutions at individual sites, structural variants (affecting 50 bp or more), which include deletions, duplications, inversions, and insertions, contribute a greater diversity at the nucleotide level than any other class of variation and are important in genome evolution and disease susceptibility (Huddleston and Eichler, 2016). We present the structural variation analysis of the Human Genome Diversity Project (HGDP)-Centre d’Etude du Polymorphism CEPH panel (Figure 1A), a dataset composed of 911 samples from 54 populations of linguistic, anthropological, and evolutionary interest (Cann et al, 2002). We generate a comprehensive resource of structural variants from these diverse and understudied populations, explore the structure of different classes of structural variation, characterize regional and population-specific variants and expansions, discover putatively introgressed variants, and identify sequences missing from the GRCh38 reference

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call