Abstract

While metagenomic sequencing has become the tool of preference to study host-associated microbial communities, downstream analyses and clinical interpretation of microbiome data remains challenging due to the sparsity and compositionality of sequence matrices. Here, we evaluate both computational and experimental approaches proposed to mitigate the impact of these outstanding issues. Generating fecal metagenomes drawn from simulated microbial communities, we benchmark the performance of thirteen commonly used analytical approaches in terms of diversity estimation, identification of taxon-taxon associations, and assessment of taxon-metadata correlations under the challenge of varying microbial ecosystem loads. We find quantitative approaches including experimental procedures to incorporate microbial load variation in downstream analyses to perform significantly better than computational strategies designed to mitigate data compositionality and sparsity, not only improving the identification of true positive associations, but also reducing false positive detection. When analyzing simulated scenarios of low microbial load dysbiosis as observed in inflammatory pathologies, quantitative methods correcting for sampling depth show higher precision compared to uncorrected scaling. Overall, our findings advocate for a wider adoption of experimental quantitative approaches in microbiome research, yet also suggest preferred transformations for specific cases where determination of microbial load of samples is not feasible.

Highlights

  • While metagenomic sequencing has become the tool of preference to study host-associated microbial communities, downstream analyses and clinical interpretation of microbiome data remains challenging due to the sparsity and compositionality of sequence matrices

  • To benchmark these data transformation approaches in metagenomic data analysis, we assessed their performance in terms of richness evaluation, identification of taxon–metadata correlations, and detection of taxon–taxon associations in three distinct ecological scenarios (Fig. 1b and Supplementary Data 1)

  • Starting from a set of three realistic ecological scenarios, we here benchmarked 13 different analytical approaches for microbiome research to characterize their potential to deal with data compositionality and varying sampling depths, challenges encountered both in amplicon sequencing and shotgun metagenomics

Read more

Summary

Introduction

While metagenomic sequencing has become the tool of preference to study host-associated microbial communities, downstream analyses and clinical interpretation of microbiome data remains challenging due to the sparsity and compositionality of sequence matrices. We evaluate both computational and experimental approaches proposed to mitigate the impact of these outstanding issues. The resulting sequence matrices can only be analyzed in terms of relative proportions of microbial features (taxa or functions) present in a sample of an unquantified community[3] Within such proportional data structures, relative abundances are not independent (data compositionality).

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call