Computerized diagnostic decision support systems- a comparative performance study of Isabel Pro vs. ChatGPT4.

Joe M Bridges

doi:10.1515/dx-2024-0033

Abstract

Validate the diagnostic accuracy of the Artificial Intelligence Large Language Model ChatGPT4 by comparing diagnosis lists produced by ChatGPT4 to Isabel Pro. This study used 201 cases, comparing ChatGPT4 to Isabel Pro. Systems inputs were identical. Mean Reciprocal Rank (MRR) compares the correct diagnosis's rank between systems. Isabel Pro ranks by the frequency with which the symptoms appear in the reference dataset. The mechanism ChatGPT4 uses to rank the diagnoses is unknown. A Wilcoxon Signed Rank Sum test failed to reject the null hypothesis. Both systems produced comprehensive differential diagnosis lists. Isabel Pro's list appears immediately upon submission, while ChatGPT4 takes several minutes. Isabel Pro produced 175 (87.1 %) correct diagnoses and ChatGPT4 165 (82.1 %). The MRR for ChatGPT4 was 0.428 (rank 2.31), and Isabel Pro was 0.389 (rank 2.57), an average rank of three for each. ChatGPT4 outperformed on Recall at Rank 1, 5, and 10, with Isabel Pro outperforming at 20, 30, and 40. The Wilcoxon Signed Rank Sum Test confirmed that the sample size was inadequate to conclude that the systems are equivalent. ChatGPT4 fabricated citations and DOIs, producing 145 correct references (87.9 %) but only 52 correct DOIs (31.5 %). This study validates the promise of Clinical Diagnostic Decision Support Systems, including the Large Language Model form of artificial intelligence (AI). Until the issue of hallucination of references and, perhaps diagnoses, is resolved in favor of absolute accuracy, clinicians will make cautious use of Large Language Model systems in diagnosis, if at all.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Diagnosis (Berlin, Germany)	Publication Date: May 7, 2024
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Computerized diagnostic decision support systems- a comparative performance study of Isabel Pro vs. ChatGPT4.

Abstract

Talk to us

Similar Papers

More From: Diagnosis (Berlin, Germany)

Lead the way for us

Similar Papers

Accuracy and usability of a diagnostic decision support system in the diagnosis of three representative rheumatic diseases: a randomized controlled trial among medical students
Johannes Knitza ... Arnd Kleyer
Arthritis Research & Therapy | VOL. 23
Johannes Knitza, et. al.Johannes Knitza ... Arnd Kleyer
06 Sep 2021
Arthritis Research & Therapy | VOL. 23

The introduction of a diagnostic decision support system (DXplain™) into the workflow of a teaching hospital service can decrease the cost of service for diagnostically challenging Diagnostic Related Groups (DRGs)
Peter L Elkin ... G Octo Barnett
International Journal of Medical Informatics | VOL. 79
Peter L Elkin, et. al.Peter L Elkin ... G Octo Barnett
14 Oct 2010
International Journal of Medical Informatics | VOL. 79

Medical diagnostic decision support systems--past, present, and future: a threaded bibliography and brief commentary.
R A Miller
Journal of the American Medical Informatics Association | VOL. 1
R A MillerR A Miller
01 Jan 1993
Journal of the American Medical Informatics Association | VOL. 1

Short paths to diagnosis with artificial intelligence: systematic literature review on diagnostic decision support systems
Julia Sellin ... Jean Tori Pantel
Schmerz (Berlin, Germany) | VOL. 38
Julia Sellin, et. al.Julia Sellin ... Jean Tori Pantel
02 Jan 2024
Schmerz (Berlin, Germany) | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computerized diagnostic decision support systems- a comparative performance study of Isabel Pro vs. ChatGPT4.

Abstract

Talk to us

Similar Papers

More From: Diagnosis (Berlin, Germany)