Abstract

Bacteria currently included in Rhizobium leguminosarum are too diverse to be considered a single species, so we can refer to this as a species complex (the Rlc). We have found 429 publicly available genome sequences that fall within the Rlc and these show that the Rlc is a distinct entity, well separated from other species in the genus. Its sister taxon is R. anhuiense. We constructed a phylogeny based on concatenated sequences of 120 universal (core) genes, and calculated pairwise average nucleotide identity (ANI) between all genomes. From these analyses, we concluded that the Rlc includes 18 distinct genospecies, plus 7 unique strains that are not placed in these genospecies. Each genospecies is separated by a distinct gap in ANI values, usually at approximately 96% ANI, implying that it is a ‘natural’ unit. Five of the genospecies include the type strains of named species: R. laguerreae, R. sophorae, R. ruizarguesonis, “R. indicum” and R. leguminosarum itself. The 16S ribosomal RNA sequence is remarkably diverse within the Rlc, but does not distinguish the genospecies. Partial sequences of housekeeping genes, which have frequently been used to characterize isolate collections, can mostly be assigned unambiguously to a genospecies, but alleles within a genospecies do not always form a clade, so single genes are not a reliable guide to the true phylogeny of the strains. We conclude that access to a large number of genome sequences is a powerful tool for characterizing the diversity of bacteria, and that taxonomic conclusions should be based on all available genome sequences, not just those of type strains.

Highlights

  • The increasing availability of genome-scale DNA sequencing is transforming the practice of bacterial taxonomy

  • Conclusions have shown that the R. leguminosarum species complex (Rlc) forms a distinct clade

  • WeWe have shown that the R. leguminosarum species complex (Rlc) forms a distinct clade and is clearly separated from other species in the genus by a long branch in the core and is clearly separated from other species in the genus by a long branch in the core gene gene phylogeny and a gap in average nucleotide identity (ANI) values (Figures 1 and 2)

Read more

Summary

Introduction

The increasing availability of genome-scale DNA sequencing is transforming the practice of bacterial taxonomy. Many authors have used the genome sequence to provide average nucleotide identity (ANI) values that are a more convenient and accurate substitute for the outdated DNA–DNA hybridization (DDH) laboratory technique [16,17], and perhaps to extract the sequence of 16S rRNA and a few housekeeping genes that would otherwise have required separate amplification and sequencing. These are used for comparison to related species, but usually only to the type strains, i.e., a single strain representing each named species. Multiple core genes can be used to construct very robust phylogenies because discrepancies affecting individual genes are averaged out [19,20], while the distribution of accessory genes may help to distinguish species [21]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call