Abstract

High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or microbial composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, including both parameteric and nonparametric methods. We provide examples of using the Rpackages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests, partial least squares and linear models as well as nonparametric testing using community networks and the ggnetwork package.

Highlights

  • The microbiome is formed from the ecological communities of microorganisms that dominate the living world

  • Previous standard workflows depended on clustering all 16s rRNA sequences that occur within a 97% radius of similarity and assigning these to ‘Operational Taxonomic Units’ (OTUs) from reference trees1,2

  • We have shown how a complete workflow in R is available to denoise, identify and normalize generation amplicon sequencing reads using probabilistic models with parameters fit using the data at hand

Read more

Summary

Leo Lahti Finland

Zachary Charlop-Powers , The Rockefeller University, New York, USA. University of California, San Francisco, San Francisco, USA. Any reports and responses or comments on the article can be found at the end of the article. This article is included in the Bioconductor gateway. This article is included in the Phylogenetics collection

Introduction
Methods
Conclusions
14. Wickham H: ggplot2
20. Greenacre M
Findings
24. Penrose M
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call