Abstract

BackgroundMicrobial genetic diversity is often investigated via the comparison of relatively similar 16S molecules through multiple alignments between reference sequences and novel environmental samples using phylogenetic trees, direct BLAST matches, or phylotypes counts. However, are we missing novel lineages in the microbial dark universe by relying on standard phylogenetic and BLAST methods? If so, how can we probe that universe using alternative approaches? We performed a novel type of multi-marker analysis of genetic diversity exploiting the topology of inclusive sequence similarity networks.ResultsOur protocol identified 86 ancient gene families, well distributed and rarely transferred across the 3 domains of life, and retrieved their environmental homologs among 10 million predicted ORFs from human gut samples and other metagenomic projects. Numerous highly divergent environmental homologs were observed in gut samples, although the most divergent genes were over-represented in non-gut environments. In our networks, most divergent environmental genes grouped exclusively with uncultured relatives, in maximal cliques. Sequences within these groups were under strong purifying selection and presented a range of genetic variation comparable to that of a prokaryotic domain.ConclusionsMany genes families included environmental homologs that were highly divergent from cultured homologs: in 79 gene families (including 18 ribosomal proteins), Bacteria and Archaea were less divergent than some groups of environmental sequences were to any cultured or viral homologs. Moreover, some groups of environmental homologs branched very deeply in phylogenetic trees of life, when they were not too divergent to be aligned. These results underline how limited our understanding of the most diverse elements of the microbial world remains, and encourage a deeper exploration of natural communities and their genetic resources, hinting at the possibility that still unknown yet major divisions of life have yet to be discovered.ReviewersThis article was reviewed by Eugene Koonin, William Martin and James McInerney.Electronic supplementary materialThe online version of this article (doi:10.1186/s13062-015-0092-3) contains supplementary material, which is available to authorized users.

Highlights

  • Microbial genetic diversity is often investigated via the comparison of relatively similar 16S molecules through multiple alignments between reference sequences and novel environmental samples using phylogenetic trees, direct BLAST matches, or phylotypes counts

  • When homologs were present in the 3 domains of life, our graph presented the typical pattern described in [33], with eukaryotic sequences from bacterial origins connecting to bacterial sequences while eukaryotic sequences from archaeal origins connected to archaeal sequences

  • All gene families with a strong signal in the network could be exploited. These sequence showed clear homology as well as true divergence between archaeal and bacterial sequences and allowed us to depart from the conventional set of highly conserved markers, such as those found by AMPHORA [34] or PhyEco [35], and to offer a complementary analysis of genetic diversity

Read more

Summary

Introduction

Microbial genetic diversity is often investigated via the comparison of relatively similar 16S molecules through multiple alignments between reference sequences and novel environmental samples using phylogenetic trees, direct BLAST matches, or phylotypes counts. The number of bacterial and archaeal lineages in the ribosomal tree has continued growing since Woese published a first tree of life in 1987 [1, 2] These environmental sequences are rarely identical to sequences from cultured organisms, which are estimated to represent less than 1 % of species diversity [3]. The majority of the environmental sequences produced in that study were only modestly related to known ribosomal sequences (showing less than 85 % identity with known sequences) Two of these novel divisions presented more than two representatives, giving a better sense of the significant phylogenetic depth of these lineages. Phylogenetic analyses of environmental sequences hinted at the existence of previously unobserved archaea [16,17,18] and eukaryotic lineages [19,20,21,22]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.