Abstract
Thousands of whole-genome and whole-proteome sequences have been made available through advances in sequencing technology, and sequences of millions more organisms will become available in the coming years. This wealth of genetic information will provide numerous opportunities to enhance our understanding of these organisms including a greater understanding of relationships among species. Researchers have used 16S rRNA and other gene sequences to study the evolutionary origins of bacteria, but these strategies do not provide insight into the sharing of genes among bacteria via horizontal transfer. In this work we use an open source software program called pClust to cluster proteins from the complete proteomes of twelve species of Alphaproteobacteria and generate a dendrogram from the resulting orthologous protein clusters. We compare the results with dendrograms constructed using the 16S rRNA gene and multiple sequence alignment of seven housekeeping genes. Analysis of the whole proteomes of these pathogens grouped Rickettsia typhi with three other animal pathogens whereas conventional sequence analysis failed to group these pathogens together. We conclude that whole-proteome analysis can give insight into relationships among species beyond their phylogeny, perhaps reflecting the effects of horizontal gene transfer and potentially providing insight into the functions of shared genes by means of shared phenotypes.
Highlights
Because 16S rRNA is highly conserved and the rate of nucleotide changes is slow and predictable, it has become the first-line tool for inferring bacterial phylogeny [1]
Because of the unpredictability of horizontal gene transfer (HGT), it is impossible to precisely identify its phylogenic impact, but it is possible to capture a snapshot of its effects at a given time and to glean some useful information regarding the transmission of genes among different species by examining whole-genome or whole-proteome sequences
The results described above indicate that the use of whole-proteome sequences has the potential to illuminate fine-scale interrelationships—e.g., the clustering of patent pathogens that are otherwise segregated when limited sequence data sets are compared
Summary
Because 16S rRNA is highly conserved and the rate of nucleotide changes is slow and predictable, it has become the first-line tool for inferring bacterial phylogeny [1]. Because of the unpredictability of HGT, it is impossible to precisely identify its phylogenic impact, but it is possible to capture a snapshot of its effects at a given time and to glean some useful information regarding the transmission of genes among different species by examining whole-genome or whole-proteome sequences. Trees were constructed using the 16S rRNA sequences, seven housekeeping genes (see Table 1 of [22]), and the whole-proteome sequences. For the 16S rRNA trees we used both unweighted and Weighbor-weighted bootstrapping with the neighbor joining method. For the housekeeping genes we used multiple sequence alignment followed by tree construction using minimum evolution, neighbor joining, and UPGMA. The whole-proteome approach uses the open-source software program pClust [25] to cluster all orthologous proteins into groups, which, as described in [21], gives significantly better clustering results than clustering via BLAST [26]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have