Abstract
PurposeThe ability to properly analyze and interpret large microbiome data sets has lagged behind our ability to acquire such data sets from environmental or clinical samples. Sequencing instruments impose a structure on these data: the natural sample space of a 16S rRNA gene sequencing data set is a simplex, which is a part of real space that is restricted to nonnegative values with a constant sum. Such data are compositional and should be analyzed using compositionally appropriate tools and approaches. However, most of the tools for 16S rRNA gene sequencing analysis assume these data are unrestricted. MethodsWe show that existing tools for compositional data (CoDa) analysis can be readily adapted to analyze high-throughput sequencing data sets. ResultsThe Human Microbiome Project tongue versus buccal mucosa data set shows how the CoDa approach can address the major elements of microbiome analysis. Reanalysis of a publicly available autism microbiome data set shows that the CoDa approach in concert with multiple hypothesis test corrections prevent false positive identifications. ConclusionsThe CoDa approach is readily scalable to microbiome-sized analyses. We provide example code and make recommendations to improve the analysis and reporting of microbiome data sets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.