Abstract

Advances in long-read sequencing technologies have dramatically improved the contiguity and completeness of genome assemblies. Using the latest nanopore-based sequencers, we can generate enough data for the assembly of a human genome from a single flow cell. With the long-read data from these sequences, we can now routinely produce de novo genome assemblies in which half or more of a genome is contained in megabase-scale contigs. Assemblies produced from nanopore data alone, though, have relatively high error rates and can benefit from a process called polishing, in which more-accurate reads are used to correct errors in the consensus sequence. In this manuscript, we present a novel tool for genome polishing called JASPER (Jellyfish-based Assembly Sequence Polisher for Error Reduction). In contrast to many other polishing methods, JASPER gains efficiency by avoiding the alignment of reads to the assembly. Instead, JASPER uses a database of k-mer counts that it creates from the reads to detect and correct errors in the consensus. Our experiments demonstrate that JASPER is faster than alignment-based polishers, and both faster and more accurate than other k-mer based polishing methods. We also introduce the idea of using a polishing tool to create population-specific reference genomes, and illustrate this idea using sequence data from multiple individuals from Tokyo, Japan.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.