Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease.

Ryszard Kubinski,Sani Karam,Alex Hernandez-Garcia,Ryan D Martin,Falk Hildebrand,Kamran Kafi,Timur Zhanabaev,Stefan Bauer,Jean-Yves Djamen-Kepaou,Tamas Korcsmaros,Prévost Jantchou

doi:10.3389/fgene.2022.784397

Abstract

Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome’s composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various methods have been developed to analyze microbiome data which may improve the generalizability of machine learning IBD diagnostic tests. With an abundance of methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (from data processing to training a machine learning model) for microbiome-based IBD diagnostic tools. We collected fifteen 16S rRNA microbiome datasets (7,707 samples) from North America to benchmark combinations of gut microbiome features, data normalization and transformation methods, batch effect correction methods, and machine learning models. Pipeline generalizability to new cohorts of patients was evaluated with two binary classification metrics following leave-one-dataset-out cross (LODO) validation, where all samples from one study were left out of the training set and tested upon. We demonstrate that taxonomic features processed with a compositional transformation method and batch effect correction with the naive zero-centering method attain the best classification performance. In addition, machine learning models that identify non-linear decision boundaries between labels are more generalizable than those that are linearly constrained. Lastly, we illustrate the importance of generating a curated training dataset to ensure similar performance across patient demographics. These findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD.

Highlights

The human gut microbiome is a collection of microbes, viruses and fungi residing throughout the digestive tract
Our benchmark provides practical suggestions for ways to improve the performance of an inflammatory bowel disease (IBD) diagnostic test using the gut microbiome composition
Genus abundance estimates from 16S rRNA sequencing need to be normalized by a compositional transformation method, with centered log ratio (CLR) transformation being the most appropriate as it allows for each feature’s importance to the machine learning (ML) models decision to be assessed

Summary

Introduction

The human gut microbiome is a collection of microbes, viruses and fungi residing throughout the digestive tract. Alterations in the gut microbiome have been linked to illnesses such as multiple sclerosis, type II diabetes, and inflammatory bowel disease (IBD) (Gevers et al, 2014; Opazo et al, 2018). The prevalence of IBD is increasing globally over the last several decades, from 79.5 to 84.3 per 100,000 people between 1990 and 2017, with Canada having among the highest IBD rates at 700 per 100,000 people in 2018 (Benchimol et al, 2019; GBD 2017 Inflammatory Bowel Disease Collaborators, 2020). The disease etiology is currently undetermined, the increasing rates of IBD have been linked to lifestyle factors, such as a Western diet (Rizzello et al, 2019)

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in genetics	Publication Date: Feb 14, 2022
Citations: 18	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in genetics

Lead the way for us

Similar Papers

Machine-learning analysis of cross-study samples according to the gut microbiome in 12 infant cohorts.
Petri Vänni ... Martin F Laursen
mSystems | VOL. 8
Petri Vänni, et. al.Petri Vänni ... Martin F Laursen
24 Oct 2023
mSystems | VOL. 8

INFLAMMATORY BOWEL DISEASE CLASSIFICATION USING THE GUT MICROBIOME: A BENCHMARK OF MICROBIAL DATA ANALYSIS METHODS
Ryszard Kubinski ... Jean Djamen
Inflammatory Bowel Diseases | VOL. 27
Ryszard Kubinski, et. al.Ryszard Kubinski ... Jean Djamen
21 Jan 2021
Inflammatory Bowel Diseases | VOL. 27

Comprehensive assessment of machine learning methods for diagnosing gastrointestinal diseases through whole metagenome sequencing data
Sungho Lee ... Insuk Lee
Gut Microbes | VOL. 16
Sungho Lee, et. al.Sungho Lee ... Insuk Lee
07 Jul 2024
Gut Microbes | VOL. 16

Performance of Machine Learning Algorithms for Predicting Disease Activity in Inflammatory Bowel Disease.
Weimin Cai ... Yihan Chen
Inflammation | VOL. 46
Weimin Cai, et. al.Weimin Cai ... Yihan Chen
12 May 2023
Inflammation | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in genetics