Abstract

Metagenomics has changed the face of virus discovery by enabling the accurate identification of viral genome sequences without requiring isolation of the viruses. As a result, metagenomic virus discovery leaves the first and most fundamental question about any novel virus unanswered: What host does the virus infect? The diversity of the global virosphere and the volumes of data obtained in metagenomic sequencing projects demand computational tools for virus–host prediction. We focus on bacteriophages (phages, viruses that infect bacteria), the most abundant and diverse group of viruses found in environmental metagenomes. By analyzing 820 phages with annotated hosts, we review and assess the predictive power of in silico phage–host signals. Sequence homology approaches are the most effective at identifying known phage–host pairs. Compositional and abundance-based methods contain significant signal for phage–host classification, providing opportunities for analyzing the unknowns in viral metagenomes. Together, these computational approaches further our knowledge of the interactions between phages and their hosts. Importantly, we find that all reviewed signals significantly link phages to their hosts, illustrating how current knowledge and insights about the interaction mechanisms and ecology of coevolving phages and bacteria can be exploited to predict phage–host relationships, with potential relevance for medical and industrial applications.

Highlights

  • Until recently, viruses could only be identified by using culturebased methods

  • To assess the power of aligning bacterial clustered regularly interspaced short palindromic repeats (CRISPRs) spacers to phage genome sequences for recognizing phage–host associations, we identified all CRISPR arrays in the 2698 bacterial genomes in our benchmarking dataset, and assessed to what extent the spacers could be aligned to the phage genomes

  • When matching phages to their host based on sequence information, as reviewed here, the host of a phage that is related to an integrated prophage is readily detected by identifying an exact match in the bacterial genome corresponding to the full length of the isolated phage

Read more

Summary

Introduction

Viruses could only be identified by using culturebased methods. For phages, i.e. viruses that infect Bacteria or Archaea, and that constitute the majority of the global virosphere, isolation by plaquing on a bacterial lawn has been the mainstay of viral identification. These signals include cooccurrence of phages and hosts across environments, genetic homology and exact matches between phage and host genes, the presence of bacterially encoded CRISPR spacers in the phage genomes, and correlations in nucleotide usage profiles (see Table 1).

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.