Abstract

Background Massive high-throughput sequencing of short, hypervariable segments of the 16S ribosomal RNA (rRNA) gene has transformed the methodological landscape describing microbial diversity within and across complex biomes. However, several studies have shown that the methodology rather than the biological variation is responsible for the observed sample composition and distribution. This compromises true meta-analyses, although this fact is often disregarded. Results To facilitate true meta-analysis of microbiome studies, we developed NG-Tax, a pipeline for 16S rRNA gene amplicon sequence analysis that was validated with different mock communities and benchmarked against QIIME as the currently most frequently used pipeline. The microbial composition of 49 independently amplified mock samples was characterized by sequencing two variable 16S rRNA gene regions, V4 and V5-V6, in three separate sequencing runs on Illumina's HiSeq2000 platform. This allowed evaluating important factors of technical bias in taxonomic classification: 1) run-to-run sequencing variation, 2) PCR-error, and 3) region/primer specific amplification bias. Despite the short read length (~140 nt) and all technical biases, the average specificity of the taxonomic assignment for the phylotypes included in the mock communities was 96%. On average 99.94% and 92.02% of the reads could be assigned to at least family or genus level, respectively, while assignment to 'spurious genera' represented on average only 0.02% of the reads per sample. Analysis of α- and β-diversity confirmed conclusions guided by biology rather than the aforementioned methodological aspects, which was not the case when samples were analysed using QIIME. Conclusions Different biological outcomes are commonly observed due to 16S rRNA region-specific performance. NG-Tax demonstrated high robustness against choice of region and other technical biases associated with 16S rRNA gene amplicon sequencing studies, diminishing their impact and providing accurate qualitative and quantitative representation of the true sample composition. This will improve comparability between studies and facilitate efforts towards standardization.

Highlights

  • Massive high-throughput sequencing of short, hypervariable segments of the 16S ribosomal RNA gene has transformed the methodological landscape describing microbial diversity within and across complex biomes

  • Abundance thresholds are commonly used to remove spurious Operational Taxonomic Unit (OTU) generated by sequencing and PCR errors[8,36], but previous studies applied a fraction threshold defined by the complete dataset under study, thereby ignoring sample size heterogeneity which may lead to under-representation of asymmetrically distributed OTUs

  • We presented NG-Tax, an improved pipeline for 16S ribosomal RNA (rRNA) gene amplicon sequencing data, which continues to be a backbone in the analysis of microbial ecosystems

Read more

Summary

Introduction

Massive high-throughput sequencing of short, hypervariable segments of the 16S ribosomal RNA (rRNA) gene has transformed the methodological landscape describing microbial diversity within and across complex biomes. There have been great efforts to address the accuracy and reproducibility of findings from 16S rRNA gene amplicon sequencing studies through increased levels of standardization, and software pipelines provide comprehensive protocols to analyze microbial ecology datasets These efforts have arguably enhanced replicability rather than reproducibility, by providing widely adopted defaults[5]. To this end, Drummond[6] suggested that exact replication of an experiment (i.e., replicability) is less informative ( a necessary pre-requisite for any scientific endeavour) than the corroboration of findings by reproduction in different independent setups (i.e., reproducibility)[7], because biological findings that are robust to independent methodologies are arguably more dependable than any single-track analysis[5]. This distinction is highly relevant for the field of microbial ecology, where replicability is often confused with reproducibility, which is apparent from many often non-interchangeable methodologies

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call