ChatGPT-Generated Differential Diagnosis Lists for Complex Case-Derived Clinical Vignettes: Diagnostic Accuracy Evaluation.

Takanobu Hirosawa,Yukinori Harada,Tomoharu Suzuki,Taro Shimizu,Yuki Kaji,Kazuki Tokumasu,Kazuya Mizuta,Ren Kawamura

doi:10.2196/48808

Abstract

The diagnostic accuracy of differential diagnoses generated by artificial intelligence chatbots, including ChatGPT models, for complex clinical vignettes derived from general internal medicine (GIM) department case reports is unknown. This study aims to evaluate the accuracy of the differential diagnosis lists generated by both third-generation ChatGPT (ChatGPT-3.5) and fourth-generation ChatGPT (ChatGPT-4) by using case vignettes from case reports published by the Department of GIM of Dokkyo Medical University Hospital, Japan. We searched PubMed for case reports. Upon identification, physicians selected diagnostic cases, determined the final diagnosis, and displayed them into clinical vignettes. Physicians typed the determined text with the clinical vignettes in the ChatGPT-3.5 and ChatGPT-4 prompts to generate the top 10 differential diagnoses. The ChatGPT models were not specially trained or further reinforced for this task. Three GIM physicians from other medical institutions created differential diagnosis lists by reading the same clinical vignettes. We measured the rate of correct diagnosis within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and the top diagnosis. In total, 52 case reports were analyzed. The rates of correct diagnosis by ChatGPT-4 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 83% (43/52), 81% (42/52), and 60% (31/52), respectively. The rates of correct diagnosis by ChatGPT-3.5 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 73% (38/52), 65% (34/52), and 42% (22/52), respectively. The rates of correct diagnosis by ChatGPT-4 were comparable to those by physicians within the top 10 (43/52, 83% vs 39/52, 75%, respectively; P=.47) and within the top 5 (42/52, 81% vs 35/52, 67%, respectively; P=.18) differential diagnosis lists and top diagnosis (31/52, 60% vs 26/52, 50%, respectively; P=.43) although the difference was not significant. The ChatGPT models' diagnostic accuracy did not significantly vary based on open access status or the publication date (before 2011 vs 2022). This study demonstrates the potential diagnostic accuracy of differential diagnosis lists generated using ChatGPT-3.5 and ChatGPT-4 for complex clinical vignettes from case reports published by the GIM department. The rate of correct diagnoses within the top 10 and top 5 differential diagnosis lists generated by ChatGPT-4 exceeds 80%. Although derived from a limited data set of case reports from a single department, our findings highlight the potential utility of ChatGPT-4 as a supplementary tool for physicians, particularly for those affiliated with the GIM department. Further investigations should explore the diagnostic accuracy of ChatGPT by using distinct case materials beyond its training data. Such efforts will provide a comprehensive insight into the role of artificial intelligence in enhancing clinical decision-making.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR Medical Informatics	Publication Date: Oct 9, 2023
Citations: 39	License type: cc-by

R Discovery Prime

R Discovery Prime

ChatGPT-Generated Differential Diagnosis Lists for Complex Case-Derived Clinical Vignettes: Diagnostic Accuracy Evaluation.

Abstract

Talk to us

Similar Papers

More From: JMIR Medical Informatics

Lead the way for us

Similar Papers

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.
Takanobu Hirosawa ... Taro Shimizu
International journal of environmental research and public health | VOL. 20
Takanobu Hirosawa, et. al.Takanobu Hirosawa ... Taro Shimizu
15 Feb 2023
International journal of environmental research and public health | VOL. 20

Survey of Inpatient Consultations with General Internal Medicine Physicians in a Tertiary Hospital: A Retrospective Observational Study.
Ren Kawamura ... Shogo Hanai
International Journal of General Medicine | VOL. 16
Ren Kawamura, et. al.Ren Kawamura ... Shogo Hanai
01 Apr 2023
International Journal of General Medicine | VOL. 16

AL amyloidosis enhances development of amyloid A amyloidosis
J.C.H Van Der Hilst ... J.P.H Drenth
British Journal of Dermatology | VOL. 156
J.C.H Van Der Hilst, et. al.J.C.H Van Der Hilst ... J.P.H Drenth
30 Jan 2007
British Journal of Dermatology | VOL. 156

Can ChatGPT-4 evaluate whether a differential diagnosis list contains the correct diagnosis as accurately as a physician?
Kazuya Mizuta ... Taro Shimizu
Diagnosis (Berlin, Germany) | VOL. 11
Kazuya Mizuta, et. al.Kazuya Mizuta ... Taro Shimizu
12 Mar 2024
Diagnosis (Berlin, Germany) | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ChatGPT-Generated Differential Diagnosis Lists for Complex Case-Derived Clinical Vignettes: Diagnostic Accuracy Evaluation.

Abstract

Talk to us

Similar Papers

More From: JMIR Medical Informatics