Abstract

The ability to predict emerging variants of SARS-CoV-2 would be of enormous value, as it would enable proactive design of vaccines in advance of such emergence. We estimated diversity of each site on a multiple sequence alignment (MSA) of the Spike (S) proteins from close relatives of SARS-CoV-2 that infected bat and pangolin before the pandemic. Then we compared the locations of high diversity sites in this MSA and those of mutations found in multiple emerging lineages of human-infecting SARS-CoV-2. This comparison revealed a significant correspondence, which suggests that a limited number of sites in this protein are repeatedly substituted in different lineages of this group of viruses. It follows, therefore, that the sites of future emerging mutations in SARS-CoV-2 can be predicted by analyzing their relatives (outgroups) that have infected non-human hosts. We discuss a possible evolutionary basis for these substitutions and provide a list of frequently substituted sites that potentially include future emerging variants in SARS-CoV-2.

Highlights

  • The ability to predict emerging variants of SARS-CoV-2 would be of enormous value, as it would enable proactive design of vaccines in advance of such emergence

  • We characterized the importance of each residue position in the S protein by comparing its diversity in SARS-CoV-2 with that in relatives that infected bats or pangolins by using a simple equation: Importance 1⁄4 diversityðSARS-CoV-2 þ outgroupÞ À diversityðSARS-CoV-2Þ; ð1Þ

  • A natural question, is why a limited set of sites with high diversity in outgroups have recently been substituted in SARS-CoV-2

Read more

Summary

Results and discussion

The changes in this region seem to be host specific: HSMSS[LF]R in pangolin; QTQTNSR in two lineages of bat; QTQTNSPRRAR (which includes a polybasic insertion recognized by host’s protease7,8) in human. Modification of the regions discussed above could affect the infectivity or enable the virus to escape from the host’s immune system, albeit temporarily, as the change will inevitably be counteracted by a shift in the antibody repertoire of the host, resulting in an effective “arms race”, as reviewed in references[10,11] In this scenario, the sites with higher diversity imply direct or indirect host–pathogen interactions and are in a constant state of flux. If richer sequence data of outgroups infecting bat, pangolin and other possible hosts becomes available, it would shed light on the origin of SARSCoV-220, and give us an advantage in the arms race with this virus

Limitations
Methods
Code availability
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.