P717 Evaluating the performance of Large Language Models in responding to patients' health queries: A comparative analysis with medical experts

Z Yan,D Xu,J Mao,S Lu,Y Fan,H C Tseng,Y Chen,H Wang,Y Yang

doi:10.1093/ecco-jcc/jjad212.0847

Abstract

Abstract Background Patients with chronic diseases exhibit a heightened interest in seeking health information, and access to high-quality information can positively impact clinical outcomes. While previous research on static internet text/video information has highlighted concerns about low-barrier creation leading to low-quality content, it remains uncertain whether similar issues persist in responses generated by Large Language Models (LLMs). Assessing the ability of LLMs in responding to medical queries provides valuable insights for their application in healthcare settings. Methods In alignment with open science principles, we utilized real patient queries from the China Crohn's and Colitis Foundation (CCCF) series "Questions and Answers on Ulcerative Colitis and Crohn's Disease." The dataset comprised questions posed by patients and corresponding answers from medical professionals, collected from outpatient visits and online social media. In September 2023, 263 patient questions were sequentially input into ChatGPT-3.5 (August 3, 2023 version), and the resulting responses were compiled alongside the original medical professional responses, forming 263 modules. Three Inflammatory Bowel Disease (IBD) specialist physicians and three IBD patients were invited to assess each module. Evaluators were instructed to: 1) choose their preferred response version, and 2) provide a multidimensional Likert 5-point subjective assessment using a crowdsourcing strategy. Additionally, the CRIE 3.0 team conducted an automated objective analysis of Simplified Chinese readability. Results Mann-Whitney U tests on text readability levels (median: 7th grade for both medical professionals and ChatGPT responses; Q1: 6th grade; Q3: 8th grade) revealed no significant difference (p=0.87), suggesting ChatGPT's performance align well with recommended literacy levels for popular science publications and is comparable to the average education level in China. Conclusion Cautiously interpreting our findings, ChatGPT's preliminary performance appears comparable to specialized IBD physicians, indicating its potential utility in patient community Q&A. Integrating ChatGPT or similar LLMs into the drafting or refinement stages of health texts is feasible. However, due to the presence of AI hallucinations and the consensus in most experimental conclusions, direct use of large language models for patient Q&A services is not recommended. Recognizing the variability in health information understanding between medical professionals and patients can enhance patient education efforts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

P717 Evaluating the performance of Large Language Models in responding to patients' health queries: A comparative analysis with medical experts

Abstract

Talk to us

Similar Papers

More From: Journal of Crohn's and Colitis

Lead the way for us

Similar Papers

Clinical outcomes for Clostridioides difficile associated diarrhea in inflammatory bowel disease patients versus non-IBD population: A retrospective cohort study.
Genady Drozdinsky ... Noa Eliakim-Raz
Medicine | VOL. 102
Genady Drozdinsky, et. al.Genady Drozdinsky ... Noa Eliakim-Raz
10 Feb 2023
Medicine | VOL. 102

Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations.
Yutaka Igarashi ... Shoji Yokobori
Journal of Nippon Medical School = Nippon Ika Daigaku zasshi | VOL. 91
Yutaka Igarashi, et. al.Yutaka Igarashi ... Shoji Yokobori
25 Apr 2024
Journal of Nippon Medical School = Nippon Ika Daigaku zasshi | VOL. 91

Screening for active COVID-19 infection prior to biologic therapy in IBD patients: Let's not increase our uncertainty without reducing our concerns
Stefano Festa ... Claudio Papi
Digestive and Liver Disease | VOL. 52
Stefano Festa, et. al.Stefano Festa ... Claudio Papi
26 May 2020
Digestive and Liver Disease | VOL. 52

Reply to comment: Screening for active COVID-19 infection prior to biologic therapy in IBD patients: primum non nŏcēre
Fabiana Zingone ... Edoardo Savarino
Digestive and Liver Disease | VOL. 52
Fabiana Zingone, et. al.Fabiana Zingone ... Edoardo Savarino
27 Jul 2020
Digestive and Liver Disease | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

P717 Evaluating the performance of Large Language Models in responding to patients' health queries: A comparative analysis with medical experts

Abstract

Talk to us

Similar Papers

More From: Journal of Crohn's and Colitis