Abstract

Background The genomic sequences of phages isolated on mycobacterial hosts are diverse, mosaic and often share little nucleotide similarity. However, about 30 unique types have been isolated, allowing most phage to be grouped into clusters and further into subclusters [1]. Many tools for the analysis of mycobacteriophage genomes depend on sequence alignment or knowledge of gene content. These methods are computationally expensive, can require significant manual input (for example, gene annotation) and can be ineffective for significantly diverged sequences [2]. We evaluated tetranucleotide usage in mycobacteriophages as an alternative to alignment-based methods for genome analysis. Description We computed tetranucleotide usage deviation, the ratio of observed counts of 4-mers in a genome to the expected count under a null model [3]. Tetranucleotide usage deviation is comparable for members of the same phage subcluster and distinct between subclusters. Neighbor joining phylogenetic trees were constructed on pairwise Euclidean distances between all genomes in the mycobacteriophage database. In almost every case, phage were placed in a monophyletic clade with members of the same subcluster. With few exceptions, trees computed from tetranucleotide usage deviation accurately reconstruct trees based on gene content for a subset of the mycobacteriophage population (Figure 1). We also evaluated the possibility of assigning clusters to unknown phage based on tetranucleotide usage deviation. Under a simple nearest neighbor classifier, cluster assignments were recovered at a frequency greater than 98%. In addition, we looked for evidence of horizontal gene transfer by using tetranucleotide difference index, a measure of the deviation in tetranucleotide usage from the genomic mean in a sliding window across the genome [3]. Tetranucleotide difference index plots showed a strong spike at the end of cluster L mycobacteriophages, which could indicate horizontal gene transfer in the region. Conclusions

Highlights

  • The genomic sequences of phages isolated on mycobacterial hosts are diverse, mosaic and often share little nucleotide similarity

  • Trees computed from tetranucleotide usage deviation accurately reconstruct trees based on gene content for a subset of the mycobacteriophage population (Figure 1)

  • * Correspondence: Benjamin_siranosian@brown.edu 1Center for Computational Molecular Biology, Brown University, Providence, RI, USA Full list of author information is available at the end of the article simple nearest neighbor classifier, cluster assignments were recovered at a frequency greater than 98%

Read more

Summary

Open Access

Tetranucleotide usage in mycobacteriophage genomes: alignment-free methods to cluster phage and infer evolutionary relationships. Benjamin Siranosian1,2*, Emma Herold, Edward Williams, Chen Ye2, Christopher de Graffenried. From Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014 Boston, MA, USA. From Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014 Boston, MA, USA. 11 July 2014

Background
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.