Abstract Background Inflammatory bowel disease (IBD), encompassing Crohn's disease (CD) and ulcerative colitis (UC), is strongly associated with gut microbiota dysbiosis. While numerous association studies have identified potential diagnostic gut microbial markers, their clinical utility remains limited due to substantial variability across studies and lack of robust validation. Furthermore, despite the availability of multiple analytical approaches for identifying microbial biomarkers, their relative effectiveness remains unclear, and definitive biomarkers for IBD diagnosis are still lacking. This study aims to identify robust gut microbial biomarkers for IBD diagnosis through multiple analytical approaches. Methods We analysed 3,762 participants, including healthy controls (HC, n = 2,467) and patients with IBD (n = 1,293), stratified into CD (n = 879) and UC (n = 414). Gut microbiota profiles were analysed from stool samples using 16S rRNA gene sequencing. Four analytical approaches were employed: (1) differential abundance analyses (DA: LEfSe, ANCOM-BC, and MaAsLin2), (2) supervised random forest machine learning (ML), (3) unsupervised network analysis (NW), and (4) literature-based curation (LC). Biomarker candidates generated by these methods were compared for diagnostic performance using an ensemble ML model. Results Diversity analysis revealed distinct gut microbial community structures among HC, UC, and CD groups (unweighted UniFrac distance, PERMANOVA, p < 0.001, Fig. 1A). Dysbiosis scores, calculated based on these structural differences, were significantly higher in IBD groups compared to HC (Fig. 1B). Each analytical approach identified distinct microbial biomarkers associated with IBD and its subtypes. Supervised ML produced the most effective biomarker sets, achieving the highest diagnostic performance for distinguishing IBD from HC (AUC = 0.971), surpassing LC (AUC = 0.938), DA (AUC = 0.928), and NW (AUC = 0.912), Fig. 2. ML biomarker sets also demonstrated superior accuracy in distinguishing CD from HC (AUC = 0.933) and UC from CD (AUC = 0.892). These results highlight the superiority of ML in selecting robust biomarkers for IBD diagnostics across multiple classification tasks. Conclusion This study identifies robust gut microbial biomarkers for IBD diagnosis through multiple analytical approaches. The biomarker panel demonstrates high diagnostic accuracy in distinguishing both IBD from healthy controls and CD from UC. External validation of these findings is currently ongoing in an independent cohort. These findings provide a foundation for developing reliable microbiome-based diagnostic tools for IBD, potentially enabling more precise disease diagnosis and improved patient care.
Read full abstract