Abstract
Microbial genomes have been shaped by parent-to-offspring (vertical) descent and lateral genetic transfer. These processes can be distinguished by alignment-based inference and comparison of phylogenetic trees for individual gene families, but this approach is not scalable to whole-genome sequences, and a tree-like structure does not adequately capture how these processes impact microbial physiology. Here we adopted alignment-free approaches based on k-mer statistics to infer phylogenomic networks involving 2,783 completely sequenced bacterial and archaeal genomes and compared the contributions of rRNA, protein-coding, and plasmid sequences to these networks. Our results show that the phylogenomic signal arising from ribosomal RNAs is strong and extends broadly across all taxa, whereas that from plasmids is strong but restricted to closely related groups, particularly Proteobacteria. However, the signal from the other chromosomal regions is restricted in breadth. We show that mean k-mer similarity can correlate with taxonomic rank. We also link the implicated k-mers to genome annotation (thus, functions) and define core k-mers (thus, core functions) in specific phyletic groups. Highly conserved functions in most phyla include amino acid metabolism and transport as well as energy production and conversion. Intracellular trafficking and secretion are the most prominent core functions among Spirochaetes, whereas energy production and conversion are not highly conserved among the largely parasitic or commensal Tenericutes. These observations suggest that differential conservation of functions relates to niche specialization and evolutionary diversification of microbes. Our results demonstrate that k-mer approaches can be used to efficiently identify phylogenomic signals and conserved core functions at the multigenome scale. IMPORTANCE Genome evolution of microbes involves parent-to-offspring descent, and lateral genetic transfer that convolutes the phylogenomic signal. This study investigated phylogenomic signals among thousands of microbial genomes based on short subsequences without using multiple-sequence alignment. The signal from ribosomal RNAs is strong across all taxa, and the signal of plasmids is strong only in closely related groups, particularly Proteobacteria. However, the signal from other chromosomal regions (∼99% of the genomes) is remarkably restricted in breadth. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes. These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure.
Highlights
Microbial genomes have been shaped by parent-to-offspring descent and lateral genetic transfer
Plasmid and phage sequences in particular are expected to increase the connectivity of phylogenomic networks, any genetic material that becomes established in a new host genome after transmission by such a vector can contribute
These networks capture the relatedness among these genomes, i.e., are phylogenomic, the relative contributions of the vertical and lateral components depend on the subset of data used as input
Summary
Microbial genomes have been shaped by parent-to-offspring (vertical) descent and lateral genetic transfer. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure. For nearly 100 years following the discovery of diverse bacteria by Pasteur, Koch, Cohn, and others in the latter decades of the 19th century [1], little was known of how these organisms might be related among themselves or to the rest of the living world This began to change with the recognition that ribosomal RNAs are present in all living cells and contain structural domains that, by virtue of their differential entanglements with core molecular functions and their interactions with greater or lesser numbers of other components of the translational apparatus, can inform on evolutionary history across a range of temporal scales “much as the hands of a clock separately indicate hours, minutes, and seconds” [2]. Plasmid and phage sequences in particular are expected to increase the connectivity of phylogenomic networks, any genetic material that becomes established in a new host genome after transmission by such a vector can contribute
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.