Evaluating the performance of ChatGPT in responding to questions about endoscopic procedures for patients

Hassam Ali,Pratik Patel,Itegbemie Obaitan,Babu P Mohan,Amir Humza Sohail,Lucia Smith-Martinez,Karrisa Lambert,Manesh Kumar Gangwani,Jeffrey J Easler,Douglas G Adler

doi:10.1016/j.igie.2023.10.001

Abstract

Background and aimsWe aimed to evaluate the precision, medical accuracy, superfluous content, and consistency of ChatGPT's responses to commonly asked questions about endoscopic procedures and its capability to provide emotional support, comparing its performance with the Generative Pre-trained Transformer 4 (GPT-4) model. MethodsA set of 113 questions related to esophagogastroduodenoscopy (EGD), colonoscopy, endoscopic ultrasound (EUS), and endoscopic retrograde cholangiopancreatography (ERCP) were curated from professional societies and institutional web pages. Responses from ChatGPT were generated and subsequently graded by board-certified gastroenterologists and advanced endoscopists. The emotional support efficacy of ChatGPT and GPT-4 was also assessed by a board-certified psychiatrist (LSM). ResultsChatGPT exhibited moderate precision in answering questions about EGD (57.9% comprehensive), colonoscopy (47.6% comprehensive), EUS (48.1% comprehensive), and ERCP (44.4% comprehensive). Medical accuracy was highest for EGD (52.6% fully accurate) and lowest for EUS (40.7% fully accurate). Concerning superfluous content, responses were predominantly concise for EGD and colonoscopy, with ERCP and EUS showing increased extraneous content. Reproducibility scores varied across domains, ranging from 50.34% (for EUS) to 68.6% (for EGD). GPT-4 outperformed ChatGPT in emotional support, though both models exhibited satisfactory performance. ConclusionChatGPT delivers moderately precise and medically accurate answers related to common endoscopic procedures with varying levels of extraneous content. It holds promise as a supplementary information resource for both patients and healthcare professionals.

Full Text