Abstract

Microbiomes are complex ecological systems that play crucial roles in understanding natural phenomena from human disease to climate change. Especially in human gut microbiome studies, where collecting clinical samples can be arduous, the number of taxa considered in any one study often exceeds the number of samples ten to one hundred-fold. This discrepancy decreases the power of studies to identify meaningful differences between samples, increases the likelihood of false positive results, and subsequently limits reproducibility. Despite the vast collections of microbiome data already available, biome-specific patterns of microbial structure are not currently leveraged to inform studies. Here, we derive microbiome-level properties by applying an embedding algorithm to quantify taxon co-occurrence patterns in over 18,000 samples from the American Gut Project (AGP) microbiome crowdsourcing effort. We then compare the predictive power of models trained using properties, normalized taxonomic count data, and another commonly used dimensionality reduction method, Principal Component Analysis in categorizing samples from individuals with inflammatory bowel disease (IBD) and healthy controls. We show that predictive models trained using property data are the most accurate, robust, and generalizable, and that property-based models can be trained on one dataset and deployed on another with positive results. Furthermore, we find that properties correlate significantly with known metabolic pathways. Using these properties, we are able to extract known and new bacterial metabolic pathways associated with inflammatory bowel disease across two completely independent studies. By providing a set of pre-trained embeddings, we allow any V4 16S amplicon study to apply the publicly informed properties to increase the statistical power, reproducibility, and generalizability of analysis.

Highlights

  • Microbial survey studiesRecent findings suggest that resident microbiomes of the human anatomy influence our bodies and minds in ways we have only just begun to understand

  • By applying the GloVe algorithm to 18,480 gut microbiome samples from the American Gut Project, we construct a 26726 by 26726 Amplicon Sequence Variances (ASVs) co-occurrence matrix, and subsequently a 26726 ASV by 100 property embedding transformation matrix, which can be used to project any sample by ASV table into embedding space

  • We test the performance of models trained on taxonomic counts, embedded data, and pca transformed data across combinations of training/testing datasets: 1. Models trained and tested on the same dataset used to construct embeddings (AGP) 2

Read more

Summary

Introduction

Microbial survey studiesRecent findings suggest that resident microbiomes of the human anatomy influence our bodies and minds in ways we have only just begun to understand. Current technology sequences various hypervariable regions of the 16S rRNA gene, which acts as an accessible taxonomic tag to measure the abundances of taxa in a community Studies using this 16S survey technique have reported incredibly diverse collections of microbes in several systems. Along with consortium studies like the American Gut Project (AGP) [24] and the Human Microbiome Project [25], have invested colossal effort to document that diversity by creating publicly available reference repositories Amongst these are repositories of stool-associated microbiota that have furthered our understanding of the role of the microbiome in several diseases, especially inflammatory bowel disease (IBD) [4]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.