Abstract

Whole-genome sequencing of viral isolates is critical for informing transmission patterns and for the ongoing evolution of pathogens, especially during a pandemic. However, when genomes have low variability in the early stages of a pandemic, the impact of technical and/or sequencing errors increases. We quantitatively assessed inter-laboratory differences in consensus genome assemblies of 72 matched SARS-CoV-2-positive specimens sequenced at different laboratories in Sydney, Australia. Raw sequence data were assembled using two different bioinformatics pipelines in parallel, and resulting consensus genomes were compared to detect laboratory-specific differences. Matched genome sequences were predominantly concordant, with a median pairwise identity of 99.997%. Identified differences were predominantly driven by ambiguous site content. Ignoring these produced differences in only 2.3% (5/216) of pairwise comparisons, each differing by a single nucleotide. Matched samples were assigned the same Pango lineage in 98.2% (212/216) of pairwise comparisons, and were mostly assigned to the same phylogenetic clade. However, epidemiological inference based only on single nucleotide variant distances may lead to significant differences in the number of defined clusters if variant allele frequency thresholds for consensus genome generation differ between laboratories. These results underscore the need for a unified, best-practices approach to bioinformatics between laboratories working on a common outbreak problem.

Highlights

  • The first genome sequences for severe acute respiratory syndrome coronavirus 2(SARS-CoV-2), the causative pathogen of coronavirus disease 2019 (COVID-19), were released in January 2020 [1,2]

  • Despite recovering some differences between matched samples, these differences constituted only a small fraction of the SARS-CoV-2 genomes. These small differences had no impact on Pango lineage assignment or on placement of samples in a phylogenetic tree in the vast majority of cases

  • SARS-CoV-2 nucleic acid extracts are relatively robust to transport on dry ice between laboratories given proper handling, which is promising for laboratories that have the capacity to test for SARS-CoV-2 positivity, but do not have whole-genome sequencing (WGS) capacity; and (b) any minor differences between sequencing laboratories will likely have a negligible impact on molecular epidemiological inferences based on Pango lineage calls or placement in a phylogenetic tree

Read more

Summary

Introduction

(SARS-CoV-2), the causative pathogen of coronavirus disease 2019 (COVID-19), were released in January 2020 [1,2]. Routine whole-genome sequencing (WGS) has been adopted globally as infections of SARS-CoV-2 increase exponentially. The rapidly accumulating wealth of genomic data available for SARS-CoV-2 is crucial for real-time research into the origins and ongoing evolution of the virus. Comparison of SARS-CoV-2 genome sequences isolated from different patients can inform traditional epidemiological methods and allow the reconstruction of transmission chains [5,6,7,8,9]. Classifying SARS-CoV-2 isolates into Pango lineages is important for determining whether any diagnostic mutations, especially amino acid replacements, can be linked to increased transmission and/or any differences in disease severity in patients with

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call