Abstract

Background Most bacterial genomes contain integrated bacteriophages—prophages—in various states of decay. Many are active and able to excise from the genome and replicate, while others are cryptic prophages, remnants of their former selves. Over the last two decades, many computational tools have been developed to identify the prophage components of bacterial genomes, and it is a particularly active area for the application of machine learning approaches. However, progress is hindered and comparisons thwarted because there are no manually curated bacterial genomes that can be used to test new prophage prediction algorithms. Methods We present a library of gold-standard bacterial genome annotations that include manually curated prophage annotations, and a computational framework to compare the predictions from different algorithms. We use this suite to compare all extant stand-alone prophage prediction algorithms to identify their strengths and weaknesses. We provide a FAIR dataset for prophage identification, and demonstrate the accuracy, precision, recall, and f1 score from the analysis of seven different algorithms for the prediction of prophages. Results We identified different strengths and weaknesses between the prophage prediction tools. Several tools exhibit exceptional f1 scores, while others have better recall at the expense of more false positives. The tools vary greatly in runtime performance with few exhibiting all desirable qualities for large-scale analyses. Conclusions Our library of gold-standard prophage annotations and benchmarking framework provide a valuable resource for exploring strengths and weaknesses of current and future prophage annotation tools. We discuss caveats and concerns in this analysis, how those concerns may be mitigated, and avenues for future improvements. This framework will help developers identify opportunities for improvement and test updates. It will also help users in determining the tools that are best suited for their analysis.

Highlights

  • Bacteriophages, viruses that infect bacteria, can be either temperate or virulent

  • Temperate phages may integrate into their bacterial host genome and the host-integrated phage genome is referred to as a prophage

  • Prophages replicate as part of the host bacterial genomes until external conditions trigger a transition into the virulent lytic cycle, resulting in replication and packaging of phages and typically the death of the host bacteria

Read more

Summary

Introduction

Bacteriophages (phages), viruses that infect bacteria, can be either temperate or virulent. Prophages generally contain a set of core genes with a conserved gene order that facilitate integration into the host genome, assembly of phage structural components, replication, and lysis of the host cell (Kang et al, 2017; Canchaya et al, 2003). As well as these core genes, phages can contain an array of accessory metabolic genes that can effect significant phenotypic changes in the host bacteria (Breitbart, 2012). Progress is hindered and comparisons thwarted because there are no manually curated bacterial genomes that can be used to test new prophage prediction algorithms

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call