Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox

Jakob Wirbel,Nicolai Karcher,Georg Zeller,Konrad Zych,Shinichi Sunagawa,Peer Bork,Ece Kartal,Morgan Essex,Guillem Salazar

doi:10.1186/s13059-021-02306-1

Abstract

The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers using machine learning (ML). However, metagenomics-specific software is scarce, and overoptimistic evaluation and limited cross-study generalization are prevailing issues. To address these, we developed SIAMCAT, a versatile R toolbox for ML-based comparative metagenomics. We demonstrate its capabilities in a meta-analysis of fecal metagenomic studies (10,803 samples). When naively transferred across studies, ML models lost accuracy and disease specificity, which could however be resolved by a novel training set augmentation strategy. This reveals some biomarkers to be disease-specific, with others shared across multiple conditions. SIAMCAT is freely available from siamcat.embl.de.

Highlights

The study of microbial communities through metagenomic sequencing has begun to uncover how communities are shaped by—and interact with—their environment, including the host organism in the case of gut microbes [1, 2]
Machine learning and statistical analysis workflows implemented in SIAMCAT The SIAMCAT R package is a versatile toolbox for analyzing microbiome data from case-control studies
When comparing taxonomic and functional profiles derived from the same dataset, we found a high correlation between AUROC values (Pearson’s r = 0.92, P < 2 × 10−16), on average taxonomic profiles performed slightly better than functional profiles (Additional file 1: Figure S7)

Summary

Introduction

The study of microbial communities through metagenomic sequencing has begun to uncover how communities are shaped by—and interact with—their environment, including the host organism in the case of gut microbes [1, 2]. As the microbiome is increasingly recognized as an important factor in health and disease, many possibilities for clinical applications are emerging for diagnosis [8, 9], prognosis, or prevention of disease [10]. The prospect of clinical applications comes with an urgent need for methodological rigor in microbiome analyses in order to ensure the robustness of findings. It is necessary to assess the clinical value of biomarkers identified from the microbiome in an unbiased manner— by their statistical significance, but more importantly by their prediction accuracy on independent samples Additional issues arise from key characteristics of metagenomic data such as large technical and inter-individual variation [12], experimental bias [13], compositionality of relative abundances, zero inflation, and non-Gaussian distribution, all of which necessitate data normalization in order for ML algorithms to work well

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genome Biology	Publication Date: Mar 30, 2021
Citations: 153	License type: open-access

R Discovery Prime

R Discovery Prime

Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology

Lead the way for us

Similar Papers

Perception without preconception: comparison between the human and machine learner in recognition of tissues from histological sections
Sanghita Barui ... K S Rajmohan
Scientific Reports | VOL. 12
Sanghita Barui, et. al.Sanghita Barui ... K S Rajmohan
30 Sep 2022
Scientific Reports | VOL. 12

State-of-the-Art Review of Machine Learning Models in Civil Engineering: Based on DAMIE Classification Tree
Jaehyun Kim ... Donghwi Jung
-
Jaehyun Kim, et. al.Jaehyun Kim ... Donghwi Jung
15 May 2023
15 May 2023

Machine Learning Models for Blood Glucose Level Prediction in Patients With Diabetes Mellitus: Systematic Review and Network Meta-Analysis.
Kui Liu ... Changsheng Chen
JMIR Medical Informatics | VOL. 11
Kui Liu, et. al.Kui Liu ... Changsheng Chen
20 Nov 2023
JMIR Medical Informatics | VOL. 11

Partitioning of green-blue water fluxes around the world: ML model explainability and predictability
Daniel Althoff ... Georgia Destouni
-
Daniel Althoff, et. al.Daniel Althoff ... Georgia Destouni
28 Mar 2022
28 Mar 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology