P389 Comparative Evaluation of ChatGPT and Human Specialists in the Application of ECCO Guidelines for the Management of Inflammatory Bowel Diseases and Malignancies: A Proof-of-Concept Study

D Ben-Hur,I Maza,R Weisshof,Y Gorelik,H Bar-Yoseph,E Koifman,I Ghersin,M Waterman

doi:10.1093/ecco-jcc/jjad212.0519

Abstract

Abstract Background Societal guidelines on colorectal dysplasia screening, surveillance and endoscopic management in inflammatory bowel diseases (IBD) are rather complex, and physician adherence to them is suboptimal. We aimed to evaluate the use of ChatGPT, a large language model, in generating accurate guideline-based recommendations for colorectal dysplasia screening, surveillance and endoscopic management in IBD in line with European Crohn’s and Colitis Organization (ECCO) guidelines. Methods Thirty clinical scenarios in the form of free text regarding colorectal dysplasia in IBD were prepared and presented to ChatGPT and four gastroenterologists, two of them specializing in IBD and two with non-IBD specialties. Two additional IBD specialists subsequently assessed all responses provided by ChatGPT and the four gastroenterologists, judging their accuracy according to ECCO guidelines. Results ChatGPT provided accurate recommendations in 90% of cases (27/30), while among the four gastroenterologists the correct response rates were 28/30 (93%), 23/30 (77%), 26/30 (87%), and 25/30 (83%). The latter two represent the correct response rates of the IBD experts. No statistically significant differences were observed between the accuracy of ChatGPT versus all gastroenterologists (p=0.44), or between the accuracy of ChatGPT versus the IBD experts and non-IBD expert gastroenterologists (p=0.71). Conclusion This study highlights the potential of language models in enhancing guideline adherence regarding colorectal dysplasia in IBD. Further investigation of additional resources and prospective evaluation in real-world settings are warranted.

Full Text