Abstract
A novel RNA virus, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is responsible for the ongoing outbreak of coronavirus disease 2019 (COVID-19). Population genetic analysis could be useful for investigating the origin and evolutionary dynamics of COVID-19. However, due to extensive sampling bias and existence of infection clusters during the epidemic spread, direct applications of existing approaches can lead to biased parameter estimations and data misinterpretation. In this study, we first present robust estimator for the time to the most recent common ancestor (TMRCA) and the mutation rate, and then apply the approach to analyze 12,909 genomic sequences of SARS-CoV-2. The mutation rate is inferred to be 8.69 × 10−4 per site per year with a 95% confidence interval (CI) of [8.61 × 10−4, 8.77 × 10−4], and the TMRCA of the samples inferred to be Nov 28, 2019 with a 95% CI of [Oct 20, 2019, Dec 9, 2019]. The results indicate that COVID-19 might originate earlier than and outside of Wuhan Seafood Market. We further demonstrate that genetic polymorphism patterns, including the enrichment of specific haplotypes and the temporal allele frequency trajectories generated from infection clusters, are similar to those caused by evolutionary forces such as natural selection. Our results show that population genetic methods need to be developed to efficiently detangle the effects of sampling bias and infection clusters to gain insights into the evolutionary mechanism of SARS-CoV-2. Software for implementing VirusMuT can be downloaded at https://bigd.big.ac.cn/biocode/tools/BT007081.
Highlights
The severe acute respiratory syndrome coronavirus 2 (SARSCoV-2), a novel RNA virus of the Coronaviridae family, caused an outbreak of coronavirus disease 2019 (COVID-19) in China in late December 2019, and has been rapidly spreading to more than 214 countries and areas since [1,2]
We further demonstrate that genetic polymorphism patterns, including the enrichment of specific haplotypes and the temporal allele frequency trajectories generated from infection clusters, are similar to those caused by evolutionary forces such as natural selection
D, and Fs are summary statistics developed for a random sample collected from a contemporary population, they do provide an informative summary of the genetic polymorphisms of the temporally collected SARS-CoV-2 sequences
Summary
The severe acute respiratory syndrome coronavirus 2 (SARSCoV-2), a novel RNA virus of the Coronaviridae family, caused an outbreak of coronavirus disease 2019 (COVID-19) in China in late December 2019, and has been rapidly spreading to more than 214 countries and areas since [1,2]. Population genetic methods are often used to reconstruct evolutionary history of viral infectious diseases, which supplements our knowledge of epidemic or pandemic dynamics [3,4,5,6]. One example of an infection cluster is the COVID-19 outbreak in the Diamond Princess cruise [9] Both sampling bias and presence of infection clusters cause genomic polymorphism patterns similar to those generated by evolutionary effects, such as natural selection [10,11]. A direct application of existing population genetic approaches without taking into account of sampling bias and presence of infection clusters could lead to biased parameter estimations and data misinterpretations
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have