Abstract
BackgroundCoronaviruses can be isolated from bats, civets, pangolins, birds and other wild animals. As an animal-origin pathogen, coronavirus can cross species barrier and cause pandemic in humans. In this study, a deep learning model for early prediction of pandemic risk was proposed based on the sequences of viral genomes.MethodsA total of 3257 genomes were downloaded from the Coronavirus Genome Resource Library. We present a deep learning model of cross-species coronavirus infection that combines a bidirectional gated recurrent unit network with a one-dimensional convolution. The genome sequence of animal-origin coronavirus was directly input to extract features and predict pandemic risk. The best performances were explored with the use of pre-trained DNA vector and attention mechanism. The area under the receiver operating characteristic curve (AUROC) and the area under precision-recall curve (AUPR) were used to evaluate the predictive models.ResultsThe six specific models achieved good performances for the corresponding virus groups (1 for AUROC and 1 for AUPR). The general model with pre-training vector and attention mechanism provided excellent predictions for all virus groups (1 for AUROC and 1 for AUPR) while those without pre-training vector or attention mechanism had obviously reduction of performance (about 5–25%). Re-training experiments showed that the general model has good capabilities of transfer learning (average for six groups: 0.968 for AUROC and 0.942 for AUPR) and should give reasonable prediction for potential pathogen of next pandemic. The artificial negative data with the replacement of the coding region of the spike protein were also predicted correctly (100% accuracy). With the application of the Python programming language, an easy-to-use tool was created to implements our predictor.ConclusionsRobust deep learning model with pre-training vector and attention mechanism mastered the features from the whole genomes of animal-origin coronaviruses and could predict the risk of cross-species infection for early warning of next pandemic.Graphical
Highlights
Coronaviruses can be isolated from bats, civets, pangolins, birds and other wild animals
All coronaviruses responsible for epidemics or pandemics come from wild animals, are spread through respiratory droplets and close contact, and can cause severe pneumonia
Initial virus data Coronavirus sequences were accessed from the Coronavirus Genome Resource Library on June 30, 2020, including those of MERS-CoV, human coronavirus (HCoV)-OC43, HCoV-NL63, HCoV-229E, HCoV-HKU1, SARS-CoV genome sequences and animal-origin coronaviruses [20]
Summary
Coronaviruses can be isolated from bats, civets, pangolins, birds and other wild animals. Coronavirus can cross species barrier and cause pandemic in humans. SARS-CoV, MERS-CoV, and SARS-CoV-2 are highly contagious and have caused three epidemics or pandemics this century [2, 3]. The coronaviruses responsible for pandemics are animal-origin pathogens transmitted to humans through the intermediate host [4,5,6,7]. The direct host of SARS-Cov-2 is not clear but is closely related to bats and pangolins [10, 11]. All coronaviruses responsible for epidemics or pandemics come from wild animals, are spread through respiratory droplets and close contact, and can cause severe pneumonia. We urgently need to develop a prediction model of the pandemic risk for human coronavirus infection and improve the prevention and control of infectious diseases for pandemics
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.