Abstract

Automated DNA sequencing technology is so rapid that analysis has become the rate-limiting step. Hundreds of prokaryotic genome sequences are publicly available, with new genomes uploaded at the rate of approximately 20 per month. As a result, this growing body of genome sequences will include microorganisms not previously identified, isolated, or observed. We hypothesize that evolutionary pressure exerted by an ecological niche selects for a similar genetic repertoire in those prokaryotes that occupy the same niche, and that this is due to both vertical and horizontal transmission. To test this, we have developed a novel method to classify prokaryotes, by calculating their Pfam protein domain distributions and clustering them with all other sequenced prokaryotic species. Clusters of organisms are visualized in two dimensions as ‘mountains’ on a topological map. When compared to a phylogenetic map constructed using 16S rRNA, this map more accurately clusters prokaryotes according to functional and environmental attributes. We demonstrate the ability of this map, which we term a “niche map”, to cluster according to ecological niche both quantitatively and qualitatively, and propose that this method be used to associate uncharacterized prokaryotes with their ecological niche as a means of predicting their functional role directly from their genome sequence.

Highlights

  • Available sequenced prokaryote genomes will soon number in the thousands

  • The Protein family (Pfam) annotation is an extensive collection of manually curated protein multiple sequence alignments that describes each protein family as a set of conserved domains related to a particular function

  • We present a novel computational method for clustering organisms according to their protein domain distribution and demonstrate, both quantitatively and qualitatively, that the resulting niche map correlates to the concept of ecological niche [8] better than a phylogenetic map

Read more

Summary

Introduction

Available sequenced prokaryote genomes will soon number in the thousands. Genomes of the major microbial model organisms have been sequenced, some multiple times; projects to sequence new genomes will select from a pool of increasingly obscure organisms. New algorithms that expand phylogenetics to incorporate the entire genome, including those based on average amino acid identity [13], shared gene orthology [14,15], protein structures or domains [16,17,18], and correlated indel alignments [19], were developed to either verify existing 16S rRNA phylogeny or to suggest new phylogenetic relationships [18] These algorithms are not optimized to discern the genomic relationship between organisms in a comprehensive way, since they ignore or minimize the effects of horizontal gene transfer (HGT)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.