Abstract
The ability to predict emerging variants of SARS-CoV-2 would be of enormous value, as it would enable proactive design of vaccines in advance of such emergence. We estimated diversity of each site on a multiple sequence alignment (MSA) of the Spike (S) proteins from close relatives of SARS-CoV-2 that infected bat and pangolin before the pandemic. Then we compared the locations of high diversity sites in this MSA and those of mutations found in multiple emerging lineages of human-infecting SARS-CoV-2. This comparison revealed a significant correspondence, which suggests that a limited number of sites in this protein are repeatedly substituted in different lineages of this group of viruses. It follows, therefore, that the sites of future emerging mutations in SARS-CoV-2 can be predicted by analyzing their relatives (outgroups) that have infected non-human hosts. We discuss a possible evolutionary basis for these substitutions and provide a list of frequently substituted sites that potentially include future emerging variants in SARS-CoV-2.
Highlights
The ability to predict emerging variants of SARS-CoV-2 would be of enormous value, as it would enable proactive design of vaccines in advance of such emergence
We characterized the importance of each residue position in the S protein by comparing its diversity in SARS-CoV-2 with that in relatives that infected bats or pangolins by using a simple equation: Importance 1⁄4 diversityðSARS-CoV-2 þ outgroupÞ À diversityðSARS-CoV-2Þ; ð1Þ
A natural question, is why a limited set of sites with high diversity in outgroups have recently been substituted in SARS-CoV-2
Summary
The changes in this region seem to be host specific: HSMSS[LF]R in pangolin; QTQTNSR in two lineages of bat; QTQTNSPRRAR (which includes a polybasic insertion recognized by host’s protease7,8) in human. Modification of the regions discussed above could affect the infectivity or enable the virus to escape from the host’s immune system, albeit temporarily, as the change will inevitably be counteracted by a shift in the antibody repertoire of the host, resulting in an effective “arms race”, as reviewed in references[10,11] In this scenario, the sites with higher diversity imply direct or indirect host–pathogen interactions and are in a constant state of flux. If richer sequence data of outgroups infecting bat, pangolin and other possible hosts becomes available, it would shed light on the origin of SARSCoV-220, and give us an advantage in the arms race with this virus
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have