Abstract

Illumina sequencing allows rapid, cheap and accurate whole genome bacterial analyses, but short reads (<300 bp) do not usually enable complete genome assembly. Long-read sequencing greatly assists with resolving complex bacterial genomes, particularly when combined with short-read Illumina data (hybrid assembly). However, it is not clear how different long-read sequencing methods affect hybrid assembly accuracy. Relative automation of the assembly process is also crucial to facilitating high-throughput complete bacterial genome reconstruction, avoiding multiple bespoke filtering and data manipulation steps. In this study, we compared hybrid assemblies for 20 bacterial isolates, including two reference strains, using Illumina sequencing and long reads from either Oxford Nanopore Technologies (ONT) or SMRT Pacific Biosciences (PacBio) sequencing platforms. We chose isolates from the family Enterobacteriaceae, as these frequently have highly plastic, repetitive genetic structures, and complete genome reconstruction for these species is relevant for a precise understanding of the epidemiology of antimicrobial resistance. We de novo assembled genomes using the hybrid assembler Unicycler and compared different read processing strategies, as well as comparing to long-read-only assembly with Flye followed by short-read polishing with Pilon. Hybrid assembly with either PacBio or ONT reads facilitated high-quality genome reconstruction, and was superior to the long-read assembly and polishing approach evaluated with respect to accuracy and completeness. Combining ONT and Illumina reads fully resolved most genomes without additional manual steps, and at a lower consumables cost per isolate in our setting. Automated hybrid assembly is a powerful tool for complete and accurate bacterial genome assembly.

Highlights

  • The rapid development of microbial genome sequencing methods over the last decade has revolutionized infectious disease epidemiology, and whole genome sequencing hasDe Maio et al, Microbial Genomics 2019;5 become the standard for many molecular typing applications in research and public health [1,2,3,4]

  • The mean percentage identity and identity N50 for reads aligned against their respective assemblies were higher for Oxford Nanopore Technologies (ONT) reads than Pacific Biosciences (PacBio) reads

  • Combining short-read Illumina sequencing with different long-read sequencing technologies and using Unicycler, a publicly available and widely used hybrid assembly tool, we found that ONT+Illumina hybrid assembly generally facilitates the complete assembly of complex bacterial genomes without additional manual steps

Read more

Summary

Introduction

The rapid development of microbial genome sequencing methods over the last decade has revolutionized infectious disease epidemiology, and whole genome sequencing has. It has become clear that short-read sequencing has significant limitations depending on the bacterial species and/ or epidemiological question These limitations arise largely from the inability to fully reconstruct genomic structures of interest from short reads, including both those on chromosomes and on mobile genetic elements such as plasmids [5]. An example where this genomic structure is highly relevant is the study of AMR gene transmission and evolution in species of Enterobacteriaceae, which have emerged as a major clinical problem in the last decade [6]. Short-read data from these species do not successfully facilitate assembly of the repetitive structures that extend beyond the maximum read length generated, including structures such as resistance gene cassettes, insertion sequences and transposons that are of crucial biological relevance to understanding the dissemination of key AMR genes

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call