Abstract

Validate the diagnostic accuracy of the Artificial Intelligence Large Language Model ChatGPT4 by comparing diagnosis lists produced by ChatGPT4 to Isabel Pro. This study used 201 cases, comparing ChatGPT4 to Isabel Pro. Systems inputs were identical. Mean Reciprocal Rank (MRR) compares the correct diagnosis's rank between systems. Isabel Pro ranks by the frequency with which the symptoms appear in the reference dataset. The mechanism ChatGPT4 uses to rank the diagnoses is unknown. A Wilcoxon Signed Rank Sum test failed to reject the null hypothesis. Both systems produced comprehensive differential diagnosis lists. Isabel Pro's list appears immediately upon submission, while ChatGPT4 takes several minutes. Isabel Pro produced 175 (87.1 %) correct diagnoses and ChatGPT4 165 (82.1 %). The MRR for ChatGPT4 was 0.428 (rank 2.31), and Isabel Pro was 0.389 (rank 2.57), an average rank of three for each. ChatGPT4 outperformed on Recall at Rank 1, 5, and 10, with Isabel Pro outperforming at 20, 30, and 40. The Wilcoxon Signed Rank Sum Test confirmed that the sample size was inadequate to conclude that the systems are equivalent. ChatGPT4 fabricated citations and DOIs, producing 145 correct references (87.9 %) but only 52 correct DOIs (31.5 %). This study validates the promise of Clinical Diagnostic Decision Support Systems, including the Large Language Model form of artificial intelligence (AI). Until the issue of hallucination of references and, perhaps diagnoses, is resolved in favor of absolute accuracy, clinicians will make cautious use of Large Language Model systems in diagnosis, if at all.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.