Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies.

Jonathan Thorsen,Hans Bisgaard,Martin Mortensen,Johannes Waage,Asker Brejnrod,Waleed Abu Al-Soud,Jakob Stokholm,Morten A Rasmussen,Søren Sørensen

doi:10.1186/s40168-016-0208-8

Abstract

BackgroundThere is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources.ResultsRunning more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power.ConclusionsOur results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets.Electronic supplementary materialThe online version of this article (doi:10.1186/s40168-016-0208-8) contains supplementary material, which is available to authorized users.

Highlights

There is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease
Study design and data characteristics The study was divided into three parts (Fig. 1), namely (1) false positive rate (FPR) testing, (2) spike-in retrieval testing, and (3) beta-diversity optimization
False positive rates We found striking differences in the FPR of the tested methods using identical permutations of the three large datasets A1–A3 (Additional file 3: Figure S2A)

Summary

Introduction

There is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Technical advances in DNA sequencing have allowed for the collection of high-dimensional biological data on an unprecedented scale This development has ignited a surge of scientific opportunities and interest in the human microbiome and its effects on human physiology, health, and disease [1, 2]. In the RNA-seq field which is based on similar sequencing technology, explicit modeling of this mean-variance relationship has been attempted [10, 11]

Objectives

Methods

Results

Discussion

Conclusion