Abstract

The abundance of different SSU rRNA (“16S”) gene sequences in environmental samples is widely used in studies of microbial ecology as a measure of microbial community structure and diversity. However, the genomic copy number of the 16S gene varies greatly – from one in many species to up to 15 in some bacteria and to hundreds in some microbial eukaryotes. As a result of this variation the relative abundance of 16S genes in environmental samples can be attributed both to variation in the relative abundance of different organisms, and to variation in genomic 16S copy number among those organisms. Despite this fact, many studies assume that the abundance of 16S gene sequences is a surrogate measure of the relative abundance of the organisms containing those sequences. Here we present a method that uses data on sequences and genomic copy number of 16S genes along with phylogenetic placement and ancestral state estimation to estimate organismal abundances from environmental DNA sequence data. We use theory and simulations to demonstrate that 16S genomic copy number can be accurately estimated from the short reads typically obtained from high-throughput environmental sequencing of the 16S gene, and that organismal abundances in microbial communities are more strongly correlated with estimated abundances obtained from our method than with gene abundances. We re-analyze several published empirical data sets and demonstrate that the use of gene abundance versus estimated organismal abundance can lead to different inferences about community diversity and structure and the identity of the dominant taxa in microbial communities. Our approach will allow microbial ecologists to make more accurate inferences about microbial diversity and abundance based on 16S sequence data.

Highlights

  • IntroductionThe SSU rRNA gene ( known as the 16S rRNA gene, referred to as ‘‘16S’’ hereafter) is widely used in studies of microbial ecology as a ‘‘barcode gene’’ [1] to quantify microbial community structure and diversity [2,3]

  • The SSU rRNA gene is widely used in studies of microbial ecology as a ‘‘barcode gene’’ [1] to quantify microbial community structure and diversity [2,3]

  • The widespread adoption of 16S as a microbial barcode gene has been driven by several desirable properties of the gene, including the fact that it is universal across bacteria and archaea, it can be amplified from a wide diversity of taxa at one time by the polymerase chain reaction (PCR), it is phylogenetically informative, and it can be used to identify and phylotype sequences based on extensive databases of 16S sequences with associated taxonomic and phylogenetic information [2,4]

Read more

Summary

Introduction

The SSU rRNA gene ( known as the 16S rRNA gene, referred to as ‘‘16S’’ hereafter) is widely used in studies of microbial ecology as a ‘‘barcode gene’’ [1] to quantify microbial community structure and diversity [2,3]. There are numerous advantages to using 16S as a microbial community barcode gene, and numerous disadvantages including amplification and sequencing bias and error [5,6], difficulty with the accurate taxonomic identification and binning of short sequences [7,8,9,10], and a lack of benchmark studies to guide decisions about quality control, filtering, and analysis of 16S sequence data sets derived from novel sequencing technologies Another disadvantage of the 16S gene that is relevant to inferring microbial abundance from 16S gene sequence abundances is that genomic 16S copy number varies a great deal across the tree of life [11,12,13]. The use of a single-copy protein coding gene such as rpoB as a microbial barcode would avoid this problem [11,19,20], Author Summary

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call