Abstract

The world is experiencing one of the most severe viral outbreaks in the last few years, the pandemic infection by SARS-CoV-2, the causative agent of COVID-19 disease. As of December 10th 2021, the virus has spread worldwide, with a total number of more than 267 million of confirmed cases (four times more in the last year), and more than 5 million deaths. A great effort has been undertaken to molecularly characterize the virus, track the spreading of different variants across the globe with the aim to understand the potential effects in terms of transmission capability and different fatality rates. Here we focus on the genomic diversity and distribution of the virus in the early stages of the pandemic, to better characterize the origin of COVID-19 and to define the geographical and temporal evolution of genetic clades. By performing a comparative analysis of 75401 SARS-CoV-2 reported sequences (as of December 2020), using as reference the first viral sequence reported in Wuhan in December 2019, we described the existence of 26538 genetic variants, the most frequent clustering into four major clades characterized by a specific geographical distribution. Notably, we found the most frequent variant, the previously reported missense p.Asp614Gly in the S protein, as a single mutation in only three patients, whereas in the large majority of cases it occurs in concomitance with three other variants, suggesting a high linkage and that this variant alone might not provide a significant selective advantage to the virus. Moreover, we evaluated the presence and the distribution in our dataset of the mutations characterizing the so called “british variant”, identified at the beginning of 2021, and observed that 9 out of 17 are present only in few sequences, but never in linkage with each other, suggesting a synergistic effect in this new viral strain. In summary, this is a large-scale analysis of SARS-CoV-2 deposited sequences, with a particular focus on the geographical and temporal evolution of genetic clades in the early phase of COVID-19 pandemic.

Highlights

  • The first case of COVID-19 was reported in Wuhan, China, in December 2019 and, despite the relatively low mortality and the high percentage of asymptomatic or pauci-symptomatic subjects, the viral outbreak has literally caused a dramatic collapse of the health care system in the most hit countries, as in Italy, where the mortality rate reached over 14% between May and August 2020

  • The geographical distribution of available sequences is reported in S1 Fig. Almost 58% of all available sequences was obtained from European centers, principally United Kingdom, followed by North America (~20%), Oceania (~11%) and Asia (~8%)

  • Despite the limitations of this study, as the fact that most sequences in the first year of COVID-19 pandemic were reported by UK, and the unavailability of clinical data to correlate the different variants to disease outcome, our analysis provides a portrait of SARS-CoV-2 temporal and spatial spread during the first phases of the pandemic

Read more

Summary

Introduction

SARS-CoV-2 variants and haplotypes debated zoonotic origin [2], the clinical symptoms [3, 4], the risk factors, the potential treatments, in the urgent effort to contain the infection, to predict potentially serious disease outcomes, to find a cure. The circulation of SARS-CoV-2 before the pandemic declaration has been ascertained, and several efforts have been made to track the worldwide spreading and the genetic changes that originated different viral strains. The first case of COVID-19 was reported in Wuhan, China, in December 2019 and, despite the relatively low mortality (approximately 2% on average worldwide, as of December 10th 2021, https://ourworldindata.org/mortality-risk-covid#the-case-fatality-rate) and the high percentage of asymptomatic or pauci-symptomatic subjects (over 80%), the viral outbreak has literally caused a dramatic collapse of the health care system in the most hit countries, as in Italy, where the mortality rate reached over 14% between May and August 2020. The existence of different strains and their temporal and geographical distribution could provide relevant information on: how the virus spread all over the world, the possible acquisition of selective advantages, the most conserved sequences suitable for a vaccine design

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call