A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.

Ryan C King,Ali A Habib,Yee Hui Yeo,Yuxin Peng,Roxana Ghashghaei,David C Kunkel,Jamil S Samaan

doi:10.2196/53421

Ryan C King, Ali A Habib + Show 5 more

Open Access

https://doi.org/10.2196/53421

Copy DOI

Export

Save

Cite

Journal: JMIR Cardio	Publication Date: Apr 19, 2024
Citations: 6	License type: cc-by

Abstract
Full-Text
Similar Papers

Abstract

Listen

Amyloidosis, a rare multisystem condition, often requires complex, multidisciplinary care. Its low prevalence underscores the importance of efforts to ensure the availability of high-quality patient education materials for better outcomes. ChatGPT (OpenAI) is a large language model powered by artificial intelligence that offers a potential avenue for disseminating accurate, reliable, and accessible educational resources for both patients and providers. Its user-friendly interface, engaging conversational responses, and the capability for users to ask follow-up questions make it a promising future tool in delivering accurate and tailored information to patients. We performed a multidisciplinary assessment of the accuracy, reproducibility, and readability of ChatGPT in answering questions related to amyloidosis. In total, 98 amyloidosis questions related to cardiology, gastroenterology, and neurology were curated from medical societies, institutions, and amyloidosis Facebook support groups and inputted into ChatGPT-3.5 and ChatGPT-4. Cardiology- and gastroenterology-related responses were independently graded by a board-certified cardiologist and gastroenterologist, respectively, who specialize in amyloidosis. These 2 reviewers (RG and DCK) also graded general questions for which disagreements were resolved with discussion. Neurology-related responses were graded by a board-certified neurologist (AAH) who specializes in amyloidosis. Reviewers used the following grading scale: (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect. Questions were stratified by categories for further analysis. Reproducibility was assessed by inputting each question twice into each model. The readability of ChatGPT-4 responses was also evaluated using the Textstat library in Python (Python Software Foundation) and the Textstat readability package in R software (R Foundation for Statistical Computing). ChatGPT-4 (n=98) provided 93 (95%) responses with accurate information, and 82 (84%) were comprehensive. ChatGPT-3.5 (n=83) provided 74 (89%) responses with accurate information, and 66 (79%) were comprehensive. When examined by question category, ChatGTP-4 and ChatGPT-3.5 provided 53 (95%) and 48 (86%) comprehensive responses, respectively, to "general questions" (n=56). When examined by subject, ChatGPT-4 and ChatGPT-3.5 performed best in response to cardiology questions (n=12) with both models producing 10 (83%) comprehensive responses. For gastroenterology (n=15), ChatGPT-4 received comprehensive grades for 9 (60%) responses, and ChatGPT-3.5 provided 8 (53%) responses. Overall, 96 of 98 (98%) responses for ChatGPT-4 and 73 of 83 (88%) for ChatGPT-3.5 were reproducible. The readability of ChatGPT-4's responses ranged from 10th to beyond graduate US grade levels with an average of 15.5 (SD 1.9). Large language models are a promising tool for accurate and reliable health information for patients living with amyloidosis. However, ChatGPT's responses exceeded the American Medical Association's recommended fifth- to sixth-grade reading level. Future studies focusing on improving response accuracy and readability are warranted. Prior to widespread implementation, the technology's limitations and ethical implications must be further explored to ensure patient safety and equitable implementation.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.

Abstract

Published Version

Talk to us

Similar Papers

More From: JMIR Cardio

Lead the way for us

Similar Papers

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

The Performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: A Comparison Study.
Keiichi Ohta ... Satomi Ohta
Cureus | VOL. 15
Keiichi Ohta, et. al.Keiichi Ohta ... Satomi Ohta
12 Dec 2023
Cureus | VOL. 15

A cross sectional investigation of ChatGPT-like large language models application among medical students in China
Guixia Pan ... Jing Ni
BMC Medical Education | VOL. 24
Guixia Pan, et. al.Guixia Pan ... Jing Ni
23 Aug 2024
BMC Medical Education | VOL. 24

Utilizing large language models for EFL essay grading: An examination of reliability and validity in rubric‐based assessments
Fatih Yavuz ... Özgür Çelik
British Journal of Educational Technology | VOL. -
Fatih Yavuz, et. al.Fatih Yavuz ... Özgür Çelik
04 Jun 2024
British Journal of Educational Technology | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.

Abstract

Published Version

Talk to us

Similar Papers

More From: JMIR Cardio