Abstract Background The management of inflammatory bowel disease (IBD) involves complex clinical scenarios, especially when complications arise. Traditionally guided by clinical expertise and established guidelines, artificial intelligence (AI) is now being explored as an adjunct in medical care. This study assesses ChatGPT’s ability to provide IBD management recommendations in both initial and complex cases. By comparing ChatGPT’s responses to both, real-world decisions and medical treatment in European Crohn’s and Colitis Organisation (ECCO) guidelines, the study aims to evaluate AI’s alignment, reliability, and potential in IBD management. Methods A retrospective analysis was conducted of the electronic medical records of IBD patients in both initial and complicated phases. Management recommendations for 19 cases were generated using ChatGPT-4o and then compared with actual treatments and the ECCO guidelines. A gastroenterologist certified by the Mexican Association of Gastroenterology oversaw evaluations across 7 categories: 5-ASA, steroids, antibiotics, thiopurines, anti-TNF agents, anti-integrins, and anti-IL23. Cohen’s Kappa test was used to assess agreement. Results ChatGPT showed perfect agreement (Kappa = 1.000) with healthcare providers and ECCO guidelines for antibiotics, diagnostic workups, symptom management, surgical consultations, monitoring, and anti-IL-23 recommendations. Substantial agreement (Kappa ~0.6–0.8) was observed for 5-ASA and steroids, with minor variations but overall strong alignment. Moderate to fair agreement (Kappa ~0.3–0.5) was noted for anti-TNF and anti-integrins, reflecting variability in complex cases. For thiopurines, agreement was minimal, suggesting differences due to varying thresholds or interpretations (Figure 1). Conclusion ChatGPT displayed a perfect agreement with providers and guidelines in areas such as antibiotics, diagnostic workups, symptom management, surgical consultations, monitoring, and anti-IL-23 therapies, indicating its potential for standardising IBD care. However, discrepancies arise in complex cases involving anti-TNF and anti-integrins, possibly due to the inclusion of newer treatments not yet covered by ECCO guidelines. This underscores the necessity for AI systems to evolve alongside advancing treatment modalities. Differences in thiopurine recommendations highlight the enduring importance of clinical judgement. Notably, there is currently no standardisation in the management of IBD. This study illustrates ChatGPT’s promise as a decision-support tool for standardising IBD care. Future efforts should focus on validating AI to further apply it as an adjuvant in clinical decision-making and tailoring recommendations to individual patients.
Read full abstract