Since the onset of pandemic in 2019, SARS-CoV-2 has diverged into numerous variants driven by antigenic and infectivity-oriented selection. Some variants have accumulated fitness-enhancing mutations, evaded immunity and spread despite global vaccination campaigns. The spike (S) glycoprotein of SARS-CoV-2 demonstrated the greatest immunogenicity and amino acid substitution diversity owing to its importance in the interaction with human angiotensin receptor 2 (hACE2). The S protein consistently emerges as an amino acid substitution (AAS) hotspot in all six lineages, however, in Omicron this enrichment is significantly higher. This study attempts to design and validate a method of mapping S-protein substitution profile across variants to identify the conserved and AAS regions. A substitution matrix was created based on publicly available databases, and the substitution localization was illustrated on a cryo-electron microscopy generated S-protein model. Our analyses indicated that the diversity of N-terminal (NTD) and receptor-binding (RBD) domains exceeded that of any other regions but still contained extended low substitution density regions particularly considering significantly broader substitution profiles of Omicron BA.2 and BA.4/5. Finally, the substitution matrix was compared to a random sample alignment of variant sequences, revealing discrepancies. Therefore, it was suggested to improve matrix accuracy by processing a large number of S-protein sequences using an automated algorithm. Several critical immunogenic and receptor-interacting residues were identified in the conserved regions within NTD and RBD. In conclusion, the structural and topological analysis of S proteins of SARS-CoV-2 variants highlight distinctive amino acid substitution patterns which may be foundational in predicting future variants.
Read full abstract