Abstract

The genome of the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), the pathogen that causes coronavirus disease 2019 (COVID-19), has been sequenced at an unprecedented scale leading to a tremendous amount of viral genome sequencing data. To assist in tracing infection pathways and design preventive strategies, a deep understanding of the viral genetic diversity landscape is needed. We present here a set of genomic surveillance tools from population genetics which can be used to better understand the evolution of this virus in humans. To illustrate the utility of this toolbox, we detail an in depth analysis of the genetic diversity of SARS-CoV-2 in first year of the COVID-19 pandemic. We analyzed 329,854 high-quality consensus sequences published in the GISAID database during the pre-vaccination phase. We demonstrate that, compared to standard phylogenetic approaches, haplotype networks can be computed efficiently on much larger datasets. This approach enables real-time lineage identification, a clear description of the relationship between variants of concern, and efficient detection of recurrent mutations. Furthermore, time series change of Tajima's D by haplotype provides a powerful metric of lineage expansion. Finally, principal component analysis (PCA) highlights key steps in variant emergence and facilitates the visualization of genomic variation in the context of SARS-CoV-2 diversity. The computational framework presented here is simple to implement and insightful for real-time genomic surveillance of SARS-CoV-2 and could be applied to any pathogen that threatens the health of populations of humans and other organisms.

Highlights

  • The Severe Acute Respiratory Syndrome coronavirus 2 (SARSCoV-2) is a highly transmissible virus responsible for the current ongoing pandemic

  • Our pipeline flags a series of systematic errors induced by sequencing and bioinformatic methodologies (Methods), which were more common in the first months of the pandemic

  • The worldwide efforts to sequence and share thousands of viral genome sequences made in depth tracking of SARSCoV-2 evolution possible over time, as it spread across the world

Read more

Summary

Introduction

The Severe Acute Respiratory Syndrome coronavirus 2 (SARSCoV-2) is a highly transmissible virus responsible for the current ongoing pandemic. SARS-CoV-2 is a positive-sense, single stranded RNA genome of 29,903 nucleotides This virus is transmitted from person to person by droplet transmission. Genomic surveillance and the identification of variants of concern (VOC), their impact on transmission, disease severity and immune response are of tremendous importance to pandemic control, most notably in the context of worldwide vaccination efforts. In this context, an unparalleled wealth of chronologically and globally sampled viral genomes have been sequenced in a concerted international effort and submitted to public databases such as the Global Initiative for Sharing All Influenza Data (GISAID) [1]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call