Abstract

The SARS-CoV-2 pandemic has raised concerns in the identification of the hosts of the virus since the early stages of the outbreak. To address this problem, we proposed a deep learning method, DeepHoF, based on extracting viral genomic features automatically, to predict the host likelihood scores on five host types, including plant, germ, invertebrate, non-human vertebrate and human, for novel viruses. DeepHoF made up for the lack of an accurate tool, reaching a satisfactory AUC of 0.975 in the five-classification, and could make a reliable prediction for the novel viruses without close neighbors in phylogeny. Additionally, to fill the gap in the efficient inference of host species for SARS-CoV-2 using existing tools, we conducted a deep analysis on the host likelihood profile calculated by DeepHoF. Using the isolates sequenced in the earliest stage of the COVID-19 pandemic, we inferred that minks, bats, dogs and cats were potential hosts of SARS-CoV-2, while minks might be one of the most noteworthy hosts. Several genes of SARS-CoV-2 demonstrated their significance in determining the host range. Furthermore, a large-scale genome analysis, based on DeepHoF’s computation for the later pandemic in 2020, disclosed the uniformity of host range among SARS-CoV-2 samples and the strong association of SARS-CoV-2 between humans and minks.

Highlights

  • The global COVID-19 pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has created a long-lasting quest to look for hosts of the virus since the pandemic outbreak; the majority view is that the virus probably originated from ­bats[1]

  • For the identification of the five host types, our model can significantly outperform Basic Local Alignment Search Tool (BLAST) and discriminate between human-infective and non-human-infective viruses in a virus group such as coronaviridae

  • Using 17 SARS-CoV-2 isolates sequenced in the earliest stage of COVID-19 detection, DeepHoF predicted SARS-CoV-2 could infect humans and non-human vertebrates, which had been confirmed by the pandemic

Read more

Summary

Introduction

The global COVID-19 pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has created a long-lasting quest to look for hosts of the virus since the pandemic outbreak; the majority view is that the virus probably originated from ­bats[1]. Several published tools aiming to identify the hosts of viruses exceeded the limitation of sequencesimilarity-based strategies by machine learning methods with viral sequences or their genomic traits related to virus-host interactions, such as ­ViralHostPredictor15, ­HostPhinder16, ­WIsH17, Host Taxon ­Predictor[18], and ­VIDHOP19. While these tools performed well under some conditions, they are not considered feasible to be applied to a novel virus without knowledge of host range, like SARS-CoV-2. VIDHOP, a deep-learning-based tool, is designed to predict potential hosts of viruses, but its application was limited to three viral species: influenza A, rabies lyssavirus and rotavirus A

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call