Abstract

Defining the unique properties of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) protein sequences has potential to explain the range of Coronavirus Disease 2019 severity. To achieve this we compared proteins encoded by all Sarbecoviruses using profile Hidden Markov Model similarities to identify protein features unique to SARS-CoV-2. Consistent with previous reports, a small set of bat- and pangolin-derived Sarbecoviruses show the greatest similarity to SARS-CoV-2 but are unlikely to be the direct source of SARS-CoV-2. Three proteins (nsp3, spike, and orf9) showed regions differing between the bat Sarbecoviruses and SARS-CoV-2 and indicate virus protein features that might have evolved to support human infection and/or transmission. Spike analysis identified all regions of the protein that have tolerated change and revealed that the current SARS-CoV-2 variants of concern have sampled only a fraction (∼31 per cent) of the possible spike domain changes which have occurred historically in Sarbecovirus evolution. This result emphasises the evolvability of these coronaviruses and the potential for further change in virus replication and transmission properties over the coming years.

Highlights

  • Since the first report of Coronavirus Disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARSCoV-2) in December 2019 in Wuhan city, China (Li et al, 2020; Yang et al, 2020) and the declaration of COVID-19 a global pandemic in March 2020 by the World Health Organization, the disease has proceeded to affect every part of the world

  • We have explored the genomes across the Sarbecovirus subgenus using profile hidden Markov models. pHMMs can provide a detailed statistical description of an amino acid sequence and can be used to detect related domains found and to document their differences from a reference domain (Eddy 1998, 1996)

  • Five CoV sequences from pangolins were included in this analysis (Supplementary Table S1), including four generated by Lam et al, (2020) after sequencing the original samples described by Liu, Chen, and Chen (2019); a 5th genome (MP789) was deposited by Liu et al The bat coronavirus genome RaTG13 (GenBank MN996532.1) was identified as closely related to the SARS-CoV-2 lineage (Zhou et al, 2020b) and supports a bat coronavirus being the zoonotic source of the epidemic, despite the close genetic distance it is too far in time for RaTG13 itself to be a direct source of the pandemic SARS-CoV-2 (Boni et al, 2020)

Read more

Summary

Introduction

Since the first report of Coronavirus Disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARSCoV-2) in December 2019 in Wuhan city, China (Li et al, 2020; Yang et al, 2020) and the declaration of COVID-19 a global pandemic in March 2020 by the World Health Organization, the disease has proceeded to affect every part of the world. We sought to identify unique peptide regions of SARS-CoV-2 compared to all available Sarbecoviruses to determine viral features that might be unique to SARS-CoV-2 and that might have allowed the virus to infect, replicate, and transmit efficiently in humans. Such a comparative analysis of viral proteins might provide insights into the origin of the virus and identify the conditions that led to the zoonosis to humans, efficient spread without the need for much, if any, adaptation (MacLean et al, 2021), as well as provide leads for drug and immune targets for effective treatments

Protein domains and profile hidden Markov models
Genome scans using custom pHMM domains
Spike changes with 15 aa domains
Global proteome similarities
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call