Abstract

Long-read-only bacterial genome assemblies usually contain residual errors, most commonly homopolymer-length errors. Short-read polishing tools can use short reads to fix these errors, but most rely on short-read alignment which is unreliable in repeat regions. Errors in such regions are therefore challenging to fix and often remain after short-read polishing. Here we introduce Polypolish, a new short-read polisher which uses all-per-read alignments to repair errors in repeat sequences that other polishers cannot. Polypolish performed well in benchmarking tests using both simulated and real reads, and it almost never introduced errors during polishing. The best results were achieved by using Polypolish in combination with other short-read polishers.

Highlights

  • Long-read-only genome assemblies are inferred using Oxford Nanopore Technologies (ONT) or Pacific Biosciences (PacBio) sequencing reads

  • Oxford Nanopore genome sequences suffer from errors that limit their utility in downstream analyses

  • There are several polishing tools which can fix most errors in an Oxford Nanopore genome, but they struggle with errors in repetitive regions of the genome

Read more

Summary

Introduction

Long-read-only genome assemblies are inferred using Oxford Nanopore Technologies (ONT) or Pacific Biosciences (PacBio) sequencing reads. Systematic errors in long reads can lead to hundreds of residual errors in long-read-only assemblies of bacterial genomes, most of which are indels in homopolymer sequences [2, 3]. When these errors occur in protein-coding sequences, they cause frameshifts in the open reading frame, leading to problems with genome annotation and limiting the utility of long-read-only assemblies [4]. Short reads from Illumina platforms do not suffer from the same errors in homopolymer sequences as long reads [5] Hybrid assembly, using both short and long reads together, can produce sequences which are both complete and highly accurate. Long-readfirst hybrid assemblies can be more accurate than short-read-first hybrid assemblies, but errors often remain, in repetitive regions of the genome [3]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call