Abstract
Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (>2 kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60 Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome.
Highlights
To define regions of the genome with gaps that affect our analysis, we identified N-gaps in the hg[38] reference spanning at least 50 kb, as well as regions in the reference where gaps between in silico predicted BspQI labels spanned at least 100 kb
To build a preliminary list of putative complex regions, we looked at consensus assembly genome coverage and identified areas covered by two or more meta scaffolds
To minimize SV calling errors due to alignment errors, we considered only alignments with a confidence score higher than 9
Summary
156 samples from 26 different populations were studied. 6 non-related samples (based on pedigree information from the 1KGP), 3 males and 3 females, were chosen The corresponding lymphoblastoid cell lines (LCLs) were obtained from the Coriell Cell Repository. High-molecular-weight DNA was extracted, nicked, and labeled using the enzyme Nt.BspQI (New England Biolabs (NEB), Ipswich, MA, USA), and imaged using the Bionano Genomics Irys system (San Diego, CA, USA) to generate single-molecule maps for assembly and structural variation analysis
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have