Abstract
A human genome is sequenced and assembled de novo using a pocket-sized nanopore device.
Highlights
Sequencing data from the same DNA region (ENCFF835NTC)
Whole genome assembly (WGA) was performed with reads base-called by Metrichor
The GRCh38 identities were computed based on 1-1 alignments to the GRCh38 reference including alt sites
Summary
Sequencing data set Five laboratories collaborated to sequence DNA from the GM12878 human cell line. To improve the accuracy of our assembly we mapped previously generated whole-genome Illumina data (SRA: ERP001229) to each contig using BWA-MEM and corrected errors using Pilon This improved the estimated accuracy of our assembly to 99.29% versus GRCh8 and 99.88% versus independent GM12878 sequencing (Table 1 and Supplementary Fig. 6)[26]. Newer PacBio assemblies of a human haploid cell line, with mean read lengths greater than 10 kb, have reached contig NG50s exceeding 20 Mbp at 60× coverage[25] We subsampled this data set to a depth equivalent to ours (35×) and assembled, resulting in an NG50 of 5.7 Mbp, with the MHC split into >2 contigs. We found evidence for telomeric arrays that span 2–11 kb within 14 subtelomeric regions for GM12878 (Fig. 5c,d and Supplementary Table 11)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.