Abstract
Rapid growth of genome data provides opportunities for updating microbial evolutionary relationships, but this is challenged by the discordant evolution of individual genes. Here we build a reference phylogeny of 10,575 evenly-sampled bacterial and archaeal genomes, based on a comprehensive set of 381 markers, using multiple strategies. Our trees indicate remarkably closer evolutionary proximity between Archaea and Bacteria than previous estimates that were limited to fewer “core” genes, such as the ribosomal proteins. The robustness of the results was tested with respect to several variables, including taxon and site sampling, amino acid substitution heterogeneity and saturation, non-vertical evolution, and the impact of exclusion of candidate phyla radiation (CPR) taxa. Our results provide an updated view of domain-level relationships.
Highlights
Rapid growth of genome data provides opportunities for updating microbial evolutionary relationships, but this is challenged by the discordant evolution of individual genes
By using a purpose-built “prototype selection” algorithm to maximize evenness of genome sampling (Supplementary Fig. 1, detailed in Supplementary Note 1) and by incorporating multiple additional criteria, including marker gene presence, genome quality, and taxonomy, we selected 10,575 genomes, covering 146 of 153 phyla defined by NCBI, plus all 89 classes, 196 of 199 orders, 422 of 429 families, 2081 of 2117 genera, and 9105 of 20,779 species (Fig. 2a)
By testing against the metagenome-assembled genomes (MAGs) quality standard established by Bowers et al.[6], only 10.4% MAGs or 3.7% of all genomes fall within the lowquality draft category, while the remaining meet the criteria of either high- or medium-quality drafts (Fig. 2e)
Summary
Rapid growth of genome data provides opportunities for updating microbial evolutionary relationships, but this is challenged by the discordant evolution of individual genes. Recent years have seen discoveries of novel microbial groups enabled by culture-based and metagenomic methods[4,5,6,7], many of which represent previously unknown biodiversity[4,8,9], and keep updating our knowledge of the extent and relationships among domains as indicated by phylogenetics[10,11,12,13] Among these new discoveries is the candidate phyla radiation (CPR, referred to as Patescibacteria)[4,8], a highly diversified clade of mainly uncultivated microorganisms that may subdivide the domain of Bacteria[11], this scenario remains controversial[14]. A practical dilemma is imposed by computational limitations: adding breadth across the phylogenetic space requires more computing effort, which leads to compromises with either the quantity of genes analyzed[11] or the robustness of tree-building algorithms[14]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.