Abstract Background Identifying molecular subtypes of IBD is essential to address inconsistencies in gene expression-based classifications, clinical variability, and treatment responses in Crohn's Disease (CD) and Ulcerative Colitis (UC). Building on prior efforts using different methods and datasets, this study aimed to derive and validate IBD subtypes using transcriptomics data and unsupervised machine learning. Methods This study analysed RNA-sequenced data from inflamed and non-inflamed intestinal biopsies of 2,490 adult IBD patients. K-means clustering, guided by Within Cluster Sum of Squares (WCSS) to determine the optimal ‘K’, identified subtypes within the dataset. Distinct clusters for UC and CD were derived from gene expression, with gene set enrichment and network analysis characterizing their features. Statistical tests (Chi-squared and ANOVA) linked these clusters to clinical data for UC and CD. Results K-means clustering revealed three distinct clusters in UC and CD, whose significant association with IBD severity (UC: p = 0.000263; CD: p = 0.007006) and IBD region (p < 0.000001) was determined by Chi Squared test. ANOVA showed age significantly influenced UC clusters (p = 0.0000345) but not CD clusters (p = 0.285). In UC, Cluster 1 focused on RNA processing, DNA repair, and rapid cell turnover, with upregulation of EXOSC genes and other related genes. Cluster 2 highlighted autophagy, stress response, and signaling processes, with upregulated expression of ATG13, VPS37C, and DVL2. Cluster 3 emphasized cytoskeletal stability over metabolic activity, marked by the upregulation of SRF, SRC, and ABL1. Notably, all UC clusters demonstrated upregulation of COX1, TMSB10, and ACTB. In CD, Cluster 1 was defined by cytoskeletal dynamics and reduced protein synthesis, with upregulated expression of CFL1, F11R, and RAD23A. Cluster 2 exhibited increased protein synthesis and stress response pathways, associated with aggressive disease phenotypes, with upregulation of MTREX, SART3, and GTF3C3. Cluster 3 prioritised cytoskeletal organisation over metabolism, featuring upregulation of TESK1, ABL1, and DVL2, along with other genes. Across all CD clusters, COX1, CDH1, and SF3B1 were consistently up-regulated. Conclusion Despite certain limitations, this study categorizes UC and CD into three transcriptomics-based subtypes, identifying meaningful IBD-associated patterns, key subtyping genes, and insights into the disease's complex pathogenesis. These findings may advance new therapeutic strategies and personalized medicine for patients with distinct IBD subtypes.
Read full abstract