Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities

Jonathan L Golob,Noah G Hoffman,David N Fredricks,Elisa Margolis

doi:10.1186/s12859-017-1690-0

Abstract

BackgroundMicrobiome studies commonly use 16S rRNA gene amplicon sequencing to characterize microbial communities. Errors introduced at multiple steps in this process can affect the interpretation of the data. Here we evaluate the accuracy of operational taxonomic unit (OTU) generation, taxonomic classification, alpha- and beta-diversity measures for different settings in QIIME, MOTHUR and a pplacer-based classification pipeline, using a novel software package: DECARD.ResultsIn-silico we generated 100 synthetic bacterial communities approximating human stool microbiomes to be used as a gold-standard for evaluating the colligative performance of microbiome analysis software. Our synthetic data closely matched the composition and complexity of actual healthy human stool microbiomes. Genus-level taxonomic classification was correctly done for only 50.4–74.8% of the source organisms. Miscall rates varied from 11.9 to 23.5%. Species-level classification was less successful, (6.9–18.9% correct); miscall rates were comparable to those of genus-level targets (12.5–26.2%). The degree of miscall varied by clade of organism, pipeline and specific settings used. OTU generation accuracy varied by strategy (closed, de novo or subsampling), reference database, algorithm and software implementation. Shannon diversity estimation accuracy correlated generally with OTU-generation accuracy. Beta-diversity estimates with Double Principle Coordinate Analysis (DPCoA) were more robust against errors introduced in processing than Weighted UniFrac. The settings suggested in the tutorials were among the worst performing in all outcomes tested.ConclusionsEven when using the same classification pipeline, the specific OTU-generation strategy, reference database and downstream analysis methods selection can have a dramatic effect on the accuracy of taxonomic classification, and alpha- and beta-diversity estimation. Even minor changes in settings adversely affected the accuracy of the results, bringing them far from the best-observed result. Thus, specific details of how a pipeline is used (including OTU generation strategy, reference sets, clustering algorithm and specific software implementation) should be specified in the methods section of all microbiome studies. Researchers should evaluate their chosen pipeline and settings to confirm it can adequately answer the research question rather than assuming the tutorial or standard-operating-procedure settings will be adequate or optimal.

Highlights

Microbiome studies commonly use 16S rRNA gene amplicon sequencing to characterize microbial communities
Researchers often proceed to a classification step to identify each operational taxonomic unit (OTU) as representing a given already-known organism in a shared reference database
We developed a software package DECARD (Detailed Evaluation Creation and Analysis of Read Data) to generate realistic synthetic datasets for which we have a known source of the sequences to be used as a gold standard when evaluating microbiome analysis software

Summary

Introduction

Microbiome studies commonly use 16S rRNA gene amplicon sequencing to characterize microbial communities. We evaluate the accuracy of operational taxonomic unit (OTU) generation, taxonomic classification, alpha- and beta-diversity measures for different settings in QIIME, MOTHUR and a pplacer-based classification pipeline, using a novel software package: DECARD. Next-generation sequencing of amplicons from a taxonomically informative gene (like the small subunit ribosomal RNA gene) is useful for estimating the composition of microbial communities and has been widely applied in diverse environments. Researchers often proceed to a classification step to identify each OTU as representing a given already-known organism in a shared reference database. This process can connect the OTU sequences to the larger body of microbiological research, converting associations into a deeper understanding of the members of the community and their capabilities. Even within a given analysis pipeline, there are a variety of settings to be selected: Which OTU generating strategy should be used; which clustering algorithm; which classifier and reference database?

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 30, 2017
Citations: 55	License type: open-access

R Discovery Prime

R Discovery Prime

Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Investigating the impact of database choice on the accuracy of metagenomic read classification for the rumen microbiome
Rebecca H Smith ... Laura Glendinning
Animal Microbiome | VOL. 4
Rebecca H Smith, et. al.Rebecca H Smith ... Laura Glendinning
18 Nov 2022
Animal Microbiome | VOL. 4

“Are we barking up the wrong tree? Too much emphasis on Cutibacterium acnes and ignoring other pathogens”— a study based on next-generation sequencing of normal and diseased discs
Shanmuganathan Rajasekaran ... Rishi Mugesh Kanna
The Spine Journal | VOL. 23
Shanmuganathan Rajasekaran, et. al.Shanmuganathan Rajasekaran ... Rishi Mugesh Kanna
25 Jun 2023
The Spine Journal | VOL. 23

A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
Xiang Gao ... Huaiying Lin
BMC Bioinformatics | VOL. 18
Xiang Gao, et. al.Xiang Gao ... Huaiying Lin
10 May 2017
BMC Bioinformatics | VOL. 18

Comparing DADA2 and OTU clustering approaches in studying the bacterial communities of atopic dermatitis.
Christopher J Barnes ... Maria Asplund
Journal of Medical Microbiology | VOL. 69
Christopher J Barnes, et. al.Christopher J Barnes ... Maria Asplund
23 Sep 2020
Journal of Medical Microbiology | VOL. 69

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics