Abstract

BackgroundCoronavirus can cross the species barrier and infect humans with a severe respiratory syndrome. SARS-CoV-2 with potential origin of bat is still circulating in China. In this study, a prediction model is proposed to evaluate the infection risk of non-human-origin coronavirus for early warning.MethodsThe spike protein sequences of 2666 coronaviruses were collected from 2019 Novel Coronavirus Resource (2019nCoVR) Database of China National Genomics Data Center on Jan 29, 2020. A total of 507 human-origin viruses were regarded as positive samples, whereas 2159 non-human-origin viruses were regarded as negative. To capture the key information of the spike protein, three feature encoding algorithms (amino acid composition, AAC; parallel correlation-based pseudo-amino-acid composition, PC-PseAAC and G-gap dipeptide composition, GGAP) were used to train 41 random forest models. The optimal feature with the best performance was identified by the multidimensional scaling method, which was used to explore the pattern of human coronavirus.ResultsThe 10-fold cross-validation results showed that well performance was achieved with the use of the GGAP (g = 3) feature. The predictive model achieved the maximum ACC of 98.18% coupled with the Matthews correlation coefficient (MCC) of 0.9638. Seven clusters for human coronaviruses (229E, NL63, OC43, HKU1, MERS-CoV, SARS-CoV, and SARS-CoV-2) were found. The cluster for SARS-CoV-2 was very close to that for SARS-CoV, which suggests that both of viruses have the same human receptor (angiotensin converting enzyme II). The big gap in the distance curve suggests that the origin of SARS-CoV-2 is not clear and further surveillance in the field should be made continuously. The smooth distance curve for SARS-CoV suggests that its close relatives still exist in nature and public health is challenged as usual.ConclusionsThe optimal feature (GGAP, g = 3) performed well in terms of predicting infection risk and could be used to explore the evolutionary dynamic in a simple, fast and large-scale manner. The study may be beneficial for the surveillance of the genome mutation of coronavirus in the field.

Highlights

  • Coronavirus can cross the species barrier and infect humans with a severe respiratory syndrome

  • The performance varied from 96.15 to 98.18% for ACC and from 0.9243 to 0.9638 for Matthews correlation coefficient (MCC). This indicated that the feature G-gap dipeptide composition (GGAP) with parameter 3 had the optimal representation ability to distinguish coronaviruses with different phenotypes of cross-species transmission

  • The optimal GGAP feature representation could be explored to monitor the evolutionary dynamics of coronavirus

Read more

Summary

Introduction

Coronavirus can cross the species barrier and infect humans with a severe respiratory syndrome. SARS-CoV-2 with potential origin of bat is still circulating in China. A prediction model is proposed to evaluate the infection risk of non-human-origin coronavirus for early warning. Coronavirus (CoV) belongs to the order Nidovirales and can infect humans, mammals, and birds [1]. There are seven human coronaviruses: 229E (α-CoV), NL63 (α-CoV), OC43 (βCoV), HKU1 (β-CoV), MERS-CoV (β-CoV), SARS-CoV (β-CoV), and SARS-CoV-2 (β-CoV). MERS-CoV, SARSCoV and SARS-CoV-2 can infect humans and induce serious pneumonia with many fatal cases [3]. SARSCoVs induced an epidemic in the world, and 774 fatal cases were reported [3]. SARS-CoV-2 is still circulating in China [4,5,6]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.