Abstract

Long-read sequencing (LRS) can resolve repetitive regions, a limitation of short read (SR) data. Reduced cost and instrument size has led to a steady increase in LRS across diagnostics and research. Here, we re-basecalled FAST5 data sequenced between 2018 and 2021 and analyzed the data in relation to gDNA across a large dataset (n = 200) spanning a wide GC content (25–67%). We examined whether re-basecalled data would improve the hybrid assembly, and, for a smaller cohort, compared long read (LR) assemblies in the context of antimicrobial resistance (AMR) genes and mobile genetic elements. We included a cost analysis when comparing SR and LR instruments. We compared the R9 and R10 chemistries and reported not only a larger yield but increased read quality with R9 flow cells. There were often discrepancies with ARG presence/absence and/or variant detection in LR assemblies. Flye-based assemblies were generally efficient at detecting the presence of ARG on both the chromosome and plasmids. Raven performed more quickly but inconsistently recovered small plasmids, notably a ∼15-kb Col-like plasmid harboring blaKPC. Canu assemblies were the most fragmented, with genome sizes larger than expected. LR assemblies failed to consistently determine multiple copies of the same ARG as identified by the Unicycler reference. Even with improvements to ONT chemistry and basecalling, long-read assemblies can lead to misinterpretation of data. If LR data are currently being relied upon, it is necessary to perform multiple assemblies, although this is resource (computing) intensive and not yet readily available/useable.

Highlights

  • Antimicrobial resistance (AMR) is a serious global threat with the WHO identifying that a “post-antibiotic era—in which common infections and minor injuries can kill—far from being an apocalyptic fantasy, is instead a very real possibility for the twenty-first Century” (WHO Antimicrobial Resistance Division, 2014)

  • All isolates had fewer than 70 contigs and n = 180/200 had fewer than 20 contigs (Figure 3)

  • single nucleotide variations (SNVs) and insertions in long read (LR) assemblies were predominantly seen in the poly A/T regions, evident for Mollicutes suggesting that basecalling errors could be exacerbated in bacterial species with a lower GC content

Read more

Summary

Introduction

Antimicrobial resistance (AMR) is a serious global threat with the WHO identifying that a “post-antibiotic era—in which common infections and minor injuries can kill—far from being an apocalyptic fantasy, is instead a very real possibility for the twenty-first Century” (WHO Antimicrobial Resistance Division, 2014). AMR accounts for 700,000–750,000 annual deaths worldwide (DAG Hammarskjöld Foundation, 2019; IAGC, 2019). The Review on Antimicrobial Resistance (O’Neill, 2016) estimated the global burden of AMR to be 10 million deaths by 2050, an estimate provided well before the uptick seen during COVID-19 (Lai et al, 2021). The PacBio platform can offer fully circularized chromosomal and plasmid DNA with high fidelity (HiFi) reads; access to this 354-kg instrument and the equipment/services to prepare large yields of high-quality DNA are obvious limitations. PacBio and Illumina platforms have challenging equipment requirements, and both require accounting for and adapting to GC bias. Oxford Nanopore Technologies (ONT) platforms are considerably smaller with fewer laboratory requirements, offer “on the spot” analysis, and reduce GC bias, but have historically suffered from low read accuracy and stochastic readdepth dependent errors (Chen et al, 2021; Vasiljevic et al, 2021)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call