Abstract
This work reports the method ClassiPhage to classify phage genomes using sequence derived taxonomic features. ClassiPhage uses a set of phage specific Hidden Markov Models (HMMs) generated from clusters of related proteins. The method was validated on all publicly available genomes of phages that are known to infect Vibrionaceae. The phages belong to the well-described phage families of Myoviridae, Podoviridae, Siphoviridae, and Inoviridae. The achieved classification is consistent with the assignments of the International Committee on Taxonomy of Viruses (ICTV), all tested phages were assigned to the corresponding group of the ICTV-database. In addition, 44 out of 58 genomes of Vibrio phages not yet classified could be assigned to a phage family. The remaining 14 genomes may represent phages of new families or subfamilies. Comparative genomics indicates that the ability of the approach to identify and classify phages is correlated to the conserved genomic organization. ClassiPhage classifies phages exclusively based on genome sequence data and can be applied on distinct phage genomes as well as on prophage regions within host genomes. Possible applications include (a) classifying phages from assembled metagenomes; and (b) the identification and classification of integrated prophages and the splitting of phage families into subfamilies.
Highlights
Phages, defined as viruses that infect bacteria, are the most abundant biological entities known so far [1,2]
We describe ClassiPhage, a method for phage classification independent of a shared molecular marker, based on combination of multiple profile Hidden Markov Models (HMMs) hits generated from a set of classified phage proteomes, and generating a Markov-based classification fitting the International Committee on Taxonomy of Viruses (ICTV) classification
We were able to show that the ClassiPhage method was able to reliably classify, by scanning the protein coding sequences of (i) a set of unclassified vibriophages; (ii) experimentally proven Inoviridae; and (iii) integrated phages in a set of closed and published Vibrio genomes, into one of the four phage families
Summary
Phages, defined as viruses that infect bacteria, are the most abundant biological entities known so far [1,2]. The system is based on the evaluation of a variety of phage properties including the molecular composition of the virus genome (ss/ds, DNA, or RNA), the structure of the virus capsid and whether or not it is enveloped, the host range, pathogenicity, and sequence similarity [4,5]. Based upon these different properties the ICTV established a highly valuable and widely accepted Virus taxonomy. For that matter, a taxonomic characterization based on the phages genome sequence information has become indispensable [5]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.