Abstract

The World Health Organization characterized COVID-19 as a pandemic in March 2020, the second pandemic of the twenty-first century. Expanding virus populations, such as that of SARS-CoV-2, accumulate a number of narrowly shared polymorphisms, imposing a confounding effect on traditional clustering methods. In this context, approaches that reduce the complexity of the sequence space occupied by the SARS-CoV-2 population are necessary for robust clustering. Here, we propose subdividing the global SARS-CoV-2 population into six well-defined subtypes and 10 poorly represented genotypes named tentative subtypes by focusing on the widely shared polymorphisms in nonstructural (nsp3, nsp4, nsp6, nsp12, nsp13 and nsp14) cistrons and structural (spike and nucleocapsid) and accessory (ORF8) genes. The six subtypes and the additional genotypes showed amino acid replacements that might have phenotypic implications. Notably, three mutations (one of them in the Spike protein) were responsible for the geographical segregation of subtypes. We hypothesize that the virus subtypes detected in this study are records of the early stages of SARS-CoV-2 diversification that were randomly sampled to compose the virus populations around the world. The genetic structure determined for the SARS-CoV-2 population provides substantial guidelines for maximizing the effectiveness of trials for testing candidate vaccines or drugs.

Highlights

  • The World Health Organization characterized COVID-19 as a pandemic in March 2020, the second pandemic of the twenty-first century

  • Populations of RNA viruses evolve rapidly due to their large sizes, short generation times, and high mutation rates, the last of which is a consequence of RNA-dependent RNA polymerase (RdRP), which lacks proofreading ­activity[7]

  • To better understand the diversification of SARS-CoV-2 genomes during the pandemic, we applied a simple but robust approach to reduce the complexity of the sequence space occupied by the virus population by detecting its widely shared polymorphisms

Read more

Summary

Introduction

The World Health Organization characterized COVID-19 as a pandemic in March 2020, the second pandemic of the twenty-first century. The data set was composed of hundreds of SARS-CoV-2 genomes sampled from around the world, in the S2-based tree, we observed two clusters (Fig. S1a).

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call