Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports.

Cameron C Young,Ellie Enichen,Christian Rivera,Corinne A Auger,Nathan Grant,Arya Rao,Marc D Succi

doi:10.1002/ajmg.a.63878

Abstract

Accurately diagnosing rare pediatric diseases frequently represent a clinical challenge due to their complex and unusual clinical presentations. Here, we explore the capabilities of three large language models (LLMs), GPT-4, Gemini Pro, and a custom-built LLM (GPT-4 integrated with the Human Phenotype Ontology [GPT-4 HPO]), by evaluating their diagnostic performance on 61 rare pediatric disease case reports. The performance of the LLMs were assessed for accuracy in identifying specific diagnoses, listing the correct diagnosis among a differential list, and broad disease categories. In addition, GPT-4 HPO was tested on 100 general pediatrics case reports previously assessed on other LLMs to further validate its performance. The results indicated that GPT-4 was able to predict the correct diagnosis with a diagnostic accuracy of 13.1%, whereas both GPT-4 HPO and Gemini Pro had diagnostic accuracies of 8.2%. Further, GPT-4 HPO showed an improved performance compared with the other two LLMs in identifying the correct diagnosis among its differential list and the broad disease category. Although these findings underscore the potential of LLMs for diagnostic support, particularly when enhanced with domain-specific ontologies, they also stress the need for further improvement prior to integration into clinical practice.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports.

Abstract

Talk to us

Similar Papers

More From: American journal of medical genetics. Part A

Lead the way for us

Similar Papers

Evaluation of large language models as a diagnostic aid for complex medical cases.
Alejandro Ríos-Hoyo ... Frederick M Howard
Frontiers in medicine | VOL. 11
Alejandro Ríos-Hoyo, et. al.Alejandro Ríos-Hoyo ... Frederick M Howard
20 Jun 2024
Frontiers in medicine | VOL. 11

Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools.
Justin T Reese ... Peter N Robinson
medRxiv : the preprint server for health sciences | VOL. -
Justin T Reese, et. al.Justin T Reese ... Peter N Robinson
07 Nov 2024
medRxiv : the preprint server for health sciences | VOL. -

Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.
Oskitz Ruiz Sarrias ... Covadonga Figaredo Berjano
Cancers | VOL. 16
Oskitz Ruiz Sarrias, et. al.Oskitz Ruiz Sarrias ... Covadonga Figaredo Berjano
12 Aug 2024
Cancers | VOL. 16

Generative AI enhanced with NCCN clinical practice guidelines for clinical decision support: A case study on bone cancer.
Yanshan Wang ... Xizhi Wu
Journal of Clinical Oncology | VOL. 42
Yanshan Wang, et. al.Yanshan Wang ... Xizhi Wu
01 Jun 2024
Journal of Clinical Oncology | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports.

Abstract

Talk to us

Similar Papers

More From: American journal of medical genetics. Part A