Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

Mitchell R Vollger,Ashley D Sanders,David Porubsky,Urvashi Surti,Paul Peluso,Carl Baker,Arvis Sulovari,Gregory T Concepcion,Michael W Hunkapiller,Peter A Audano,Zev N Kronenberg,Katherine M Munson,Peter M Lansdorp,Glennis A Logsdon,Evan E Eichler,Aaron M Wenger,Diana C.J Spierings

doi:10.1111/ahg.12364

Abstract

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6kbp, CLR 191.5kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.

Highlights

The long-read sequence data were of high quality, with an estimated 54.6% of the quality-filtered circular consensus sequence (CCS) reads having a quality value (QV) > 30 (Fig. S1B,C)
It might be expected that the shorter read length of the HiFi data (N50 10.9 vs. 17.5 kbp; Fig. S1A) might lead to a less continuous assembly; we observed that the HiFi assembly had an N50 of 25.5 Mbp, which is comparable to the N50 of the continuous long-read (CLR) assembly (29.3 Mbp; Table 1, Fig. 1)
We confirmed that these results were not driven by the different assembly algorithms, but rather by the different data types, by generating additional assemblies that controlled for input coverage and assembly algorithm (Table S1, Supplemental Note)

Summary

Introduction

Recent advances in long-read sequencing technologies, including Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), have revolutionized the assembly of highly contiguous mammalian genomes (Bickhart et al, 2017; Chaisson et al, 2015; Gordon et al, 2016; Huddleston et al, 2017; Jain et al, 2018; Kronenberg et al, 2018; Low et al, 2019; Seo et al, 2016; Steinberg et al, 2016). Long-read de novo assemblies of human samples typically require 20,000–50,000 CPU hours (Chin et al, 2016; Koren et al, 2017) and terabytes of data storage. With 28-fold sequence coverage of the Genome in a Bottle Ashkenazim sample HG002, Wenger and colleagues demonstrated that it is possible to create a de novo assembly comparable to previous long-read assemblies with half the data and onetenth the compute power (Wenger et al, 2019). While compute time and throughput have improved, there is little comparison of the HiFi assembly quality of HG002 to a previous continuous long-read (CLR) HG002 genome assembly and limited assessment of the more difficult regions of the genome

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Annals of human genetics	Publication Date: Nov 11, 2019
Citations: 107	License type: other-oa

R Discovery Prime

R Discovery Prime

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Annals of human genetics

Lead the way for us

Similar Papers

Benchmarking multi-platform sequencing technologies for human genome assembly.
Jingjing Wang ... Xuefeng Xie
Briefings in bioinformatics | VOL. 24
Jingjing Wang, et. al.Jingjing Wang ... Xuefeng Xie
18 Aug 2023
Briefings in bioinformatics | VOL. 24

Characterization of large-scale genomic differences in the first complete human genome
Xiangyu Yang ... Xuankai Wang
Genome Biology | VOL. 24
Xiangyu Yang, et. al.Xiangyu Yang ... Xuankai Wang
04 Jul 2023
Genome Biology | VOL. 24

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies.
...
Nature methods | VOL. 19
, et. al. ...
31 Mar 2022
Nature methods | VOL. 19

Extensive Copy-Number Variation of the Human Olfactory Receptor Gene Family
Janet M Young ... Barbara J Trask
The American Journal of Human Genetics | VOL. 83
Janet M Young, et. al.Janet M Young ... Barbara J Trask
31 Jul 2008
The American Journal of Human Genetics | VOL. 83

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Annals of human genetics