Abstract

For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50-90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an "other" category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F1-score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as "other," providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally.

Highlights

  • Bacteriophages are the most abundant biological entity on the Earth [1]

  • An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F1-score of 0.875 and test accuracy of 86.2%

  • We evaluated the performance of 120 Artificial Neural Networks (ANNs) (10 per model type) on their respective validation set

Read more

Summary

Introduction

Bacteriophages (phages) are the most abundant biological entity on the Earth [1]. They modulate microbial communities in several possible ways: by lysing specific taxonomic members or narrow groups of microbiomes, they affect the microbial population dynamics and change niche availability for different community members. Especially structural ones, vary widely between phages and phage groups, so much so that sequence alignment based methods to assign gene function fail frequently: we are currently unable to assign function to 50–90% of phage genes [7]. The current increased interest in using phages as therapeutic agents [8,9] motivates annotations for as much of the phage genome as possible Even if they are somewhat tentative and not experimentally validated, annotations of the relatively non-toxic structural proteins versus the potentially host health-threatening toxins and other virulence factors could inform decisions whether to choose one specific phage versus another

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.