Abstract

BackgroundViruses, including bacteriophages, are important components of environmental and human associated microbial communities. Viruses can act as extracellular reservoirs of bacterial genes, can mediate microbiome dynamics, and can influence the virulence of clinical pathogens. Various targeted metagenomic analysis techniques detect viral sequences, but these methods often exclude large and genome integrated viruses. In this study, we evaluate and compare the ability of nine state-of-the-art bioinformatic tools, including Vibrant, VirSorter, VirSorter2, VirFinder, DeepVirFinder, MetaPhinder, Kraken 2, Phybrid, and a BLAST search using identified proteins from the Earth Virome Pipeline to identify viral contiguous sequences (contigs) across simulated metagenomes with different read distributions, taxonomic compositions, and complexities.ResultsOf the tools tested in this study, VirSorter achieved the best F1 score while Vibrant had the highest average F1 score at predicting integrated prophages. Though less balanced in its precision and recall, Kraken2 had the highest average precision by a substantial margin. We introduced the machine learning tool, Phybrid, which demonstrated an improvement in average F1 score over tools such as MetaPhinder. The tool utilizes machine learning with both gene content and nucleotide features. The addition of nucleotide features improves the precision and recall compared to the gene content features alone.Viral identification by all tools was not impacted by underlying read distribution but did improve with contig length. Tool performance was inversely related to taxonomic complexity and varied by the phage host. For instance, Rhizobium and Enterococcus phages were identified consistently by the tools; whereas, Neisseria prophage sequences were commonly missed in this study.ConclusionThis study benchmarked the performance of nine state-of-the-art bioinformatic tools to identify viral contigs across different simulation conditions. This study explored the ability of the tools to identify integrated prophage elements traditionally excluded from targeted sequencing approaches. Our comprehensive analysis of viral identification tools to assess their performance in a variety of situations provides valuable insights to viral researchers looking to mine viral elements from publicly available metagenomic data.

Highlights

  • Viruses, including bacteriophages, are important components of environmental and human associated microbial communities

  • Phages directly contribute to bacterial infections in humans by acting as a genetic reservoir for virulent genes in bacteria such as Escherichia coli, Salmonella enterica, Pseudomonas aeruginosa, Vibrio cholerae, Corynebacterium diphtheriae, and Streptococcus pyogenes [2, 3]

  • This bacteriophage adherence to mucus (BAM) model suggests that phages may act as a non-host derived innate immunity system to modulate the bacterial microbiome [4]

Read more

Summary

Introduction

Viruses, including bacteriophages, are important components of environmental and human associated microbial communities. Phages directly contribute to bacterial infections in humans by acting as a genetic reservoir for virulent genes in bacteria such as Escherichia coli, Salmonella enterica, Pseudomonas aeruginosa, Vibrio cholerae, Corynebacterium diphtheriae, and Streptococcus pyogenes [2, 3]. Some phages utilize Ig-like domains to attach to mucosal layers in humans to lie in wait for bacterial prey. This bacteriophage adherence to mucus (BAM) model suggests that phages may act as a non-host derived innate immunity system to modulate the bacterial microbiome [4]. Dysbiosis in the virome has been observed in disease states such as inflammatory bowel disease (IBD), Crohn’s disease, and asthma [7,8,9]

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.