Abstract
The human reference genome is used extensively in modern biological research. However, a single consensus representation is inadequate to provide a universal reference structure because it is a haplotype among many in the human population. Using 10× Genomics (10×G) “Linked-Read” technology, we perform whole genome sequencing (WGS) and de novo assembly on 17 individuals across five populations. We identify 1842 breakpoint-resolved non-reference unique insertions (NUIs) that, in aggregate, add up to 2.1 Mb of so far undescribed genomic content. Among these, 64% are considered ancestral to humans since they are found in non-human primate genomes. Furthermore, 37% of the NUIs can be found in the human transcriptome and 14% likely arose from Alu-recombination-mediated deletion. Our results underline the need of a set of human reference genomes that includes a comprehensive list of alternative haplotypes to depict the complete spectrum of genetic diversity across populations.
Highlights
The human reference genome is used extensively in modern biological research
We find that these non-reference unique insertions (NUIs) follow a population-specific pattern, which is consistent with the previous studies using single nucleotide polymorphisms[13,14]
Based on the 1000 Genomes Project (1000GP), 14 individuals representing populations most distinctive from one another were selected for 10× Genomics (10×G) whole genome sequencing (WGS) using “Linked-Read” technology
Summary
A single consensus representation is inadequate to provide a universal reference structure because it is a haplotype among many in the human population. Our results underline the need of a set of human reference genomes that includes a comprehensive list of alternative haplotypes to depict the complete spectrum of genetic diversity across populations. Despite the tremendous sequencing effort and methodological advances, the creation of a comprehensive human reference genome set that can represent the genetic variations across populations is yet to be realized. NUIs are full-length insertions that harbor at least 50 bp of non-repetitive sequences not found in the hg[38] reference set, including alternative haplotypes and patches. Our results underline the need of a set of human reference genomes that fully incorporates the diversity of sequences across populations as sequencing of individuals across the world becomes routine
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have