Abstract

BackgroundGenomes of lower organisms have been observed with a large amount of horizontal gene transfers, which cause difficulties in their evolutionary study. Bacteriophage genomes are a typical example. One recent approach that addresses this problem is the unsupervised clustering of genomes based on gene order and genome position, which helps to reveal species relationships that may not be apparent from traditional phylogenetic methods.ResultsWe propose the use of an overlapping subspace clustering algorithm for such genome classification problems. The advantage of subspace clustering over traditional clustering is that it can associate clusters with gene arrangement patterns, preserving genomic information in the clusters produced. Additionally, overlapping capability is desirable for the discovery of multiple conserved patterns within a single genome, such as those acquired from different species via horizontal gene transfers. The proposed method involves a novel strategy to vectorize genomes based on their gene distribution. A number of existing subspace clustering and biclustering algorithms were evaluated to identify the best framework upon which to develop our algorithm; we extended a generic subspace clustering algorithm called HARP to incorporate overlapping capability. The proposed algorithm was assessed and applied on bacteriophage genomes. The phage grouping results are consistent overall with the Phage Proteomic Tree and showed common genomic characteristics among the TP901-like, Sfi21-like and sk1-like phage groups. Among 441 phage genomes, we identified four significantly conserved distribution patterns structured by the terminase, portal, integrase, holin and lysin genes. We also observed a subgroup of Sfi21-like phages comprising a distinctive divergent genome organization and identified nine new phage members to the Sfi21-like genus: Staphylococcus 71, phiPVL108, Listeria A118, 2389, Lactobacillus phi AT3, A2, Clostridium phi3626, Geobacillus GBSV1, and Listeria monocytogenes PSA.ConclusionThe method described in this paper can assist evolutionary study through objectively classifying genomes based on their resemblance in gene order, gene content and gene positions. The method is suitable for application to genomes with high genetic exchange and various conserved gene arrangement, as demonstrated through our application on phages.

Highlights

  • Genomes of lower organisms have been observed with a large amount of horizontal gene transfers, which cause difficulties in their evolutionary study

  • We have proposed the use of an overlapping subspace clustering algorithm to assist evolutionary study through objectively classifying genomes based on their resemblance in gene order, gene content and genome positions

  • The advantage of subspace clustering over traditional clustering is the ability to associate clusters with gene arrangement patterns, preserving genomic information in the clusters produced

Read more

Summary

Introduction

Genomes of lower organisms have been observed with a large amount of horizontal gene transfers, which cause difficulties in their evolutionary study. For microorganisms including viruses and bacteriophages, a phylogenetic tree may not completely describe their relationship because of the relatively large amount of horizontal gene transfers (HGT) in their evolutionary history [2,3,4] Alternative strategies such as genome classification based on gene distribution [5] and classification based on short nucleotide sequences [6] have recently been proposed to provide different perspectives for understanding their genomic relationships. A number of computational methods related to gene distribution and genome rearrangement are currently available, these methods focus mainly on the close inspection of a few related species and tree reconstructions, and are not capable of discovering clusters among a large collection of genomes Details of these methods are provided in the Discussion section. The method, SynFPS, derives a score for each pair of genomes from gene-gene distances and applies K-

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.