Abstract

Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the “healthy” microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.

Highlights

  • The human microbiome constitutes the whole set of microbial organisms associated with the human host

  • Meta-analysis has been broadly adopted in other genomics applications, such as for analysis of microarray or RNA-seq data, where multiple studies have been performed for a similar purpose including identifying gene expression signatures of specific human cancers

  • We considered six available disease-associated metagenomic datasets spanning five diseases: liver cirrhosis [33], colorectal cancer [34], inflammatory bowel diseases (IBD) [35], obesity [31], and type 2 diabetes

Read more

Summary

Introduction

The human microbiome constitutes the whole set of microbial organisms associated with the human host. Even when the findings are not immediately relevant for the clinical setting, identifying associations between the microbiome and specific diseases is essential for follow-up mechanistic studies. Next-generation DNA sequencing technologies permit comprehensive profiling of the microbial communities from human-associated samples, and have been sufficiently widely employed to enable meta-analysis for discovering patterns common to independent studies. Meta-analysis has been broadly adopted in other genomics applications, such as for analysis of microarray or RNA-seq data, where multiple studies have been performed for a similar purpose including identifying gene expression signatures of specific human cancers. Rigorous meta-analyses are crucial both to validate the findings of each single study, and for providing robust models for clinical purposes

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.