The performance of large language models on fictional consult queries indicates favorable potential for AI-assisted vascular surgery consult handling

Quang Le,Kedar S Lavingia,Michael Amendola

doi:10.1016/j.jvsvi.2023.100052

Abstract

ObjectiveRecently, the use of large language models (LLMs) in medicine has become a prominent topic of discussion due to the rapid improvement of these tools in understanding and responding to natural language. Several models are widely available to the public, both proprietary and open-sourced. We aim to evaluate the possible use of such LLMs in vascular surgery by understanding their abilities to process common consult requests. MethodsThe senior author created 25 fictional vascular surgery consultation queries based on common consultation requests. Five attending surgeons and four LLMs (GPT 3.5, GPT 4, Bard, and Falcon 40B) were asked to answer whether each consult was an emergency that needed immediate attention within an hour. Responders were also asked whether the next best step was an examination, additional imaging, or an urgent operation. GPT 3.5 and 4 also provided free-response answers on the next best step, graded by attending surgeons based on scientific accuracy, possible harm, and content completeness. ResultsThe rates of accurate emergency identification were 88%, 100%, 76%, and 88% for GPT 3.5, GPT 4, Falcon 40B, and Bard, respectively. Although they have similar overall accuracy, GPT 3.5 has a high sensitivity at 100%, whereas Bard has a high specificity at 90%. GPT 4.0 had 100% sensitivity and specificity. LLMs agreed with the majority surgeon opinion on the next best step in 64% (GPT 3.5), 32% (GPT 4), 68% (Falcon 40B), and 36% (Bard) of cases. GPT 3.5 and 4 had a collective ratio of 89.5% of answers adhering to the scientific consensus. Only 5% of responses were highly likely to cause clinically significant harm. Although only 4% included incorrect content, 17.5% of answers missed important content. There was no significant difference between GPT 3.5 and 4 regarding the free-response grade. ConclusionsExisting, widely available LLMs exhibited a solid ability to identify vascular emergencies, with GPT 4.0 agreeing with surgeon attendings in 100% of cases. However, these models continue to have identifiable deficiencies in treatment recommendations, a higher-level task. Future models might help triage incoming consults and provide preliminary management suggestions. The utility of such tools in clinical practice remains to be explored.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JVS-Vascular Insights	Publication Date: Jan 1, 2024
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

The performance of large language models on fictional consult queries indicates favorable potential for AI-assisted vascular surgery consult handling

Abstract

Talk to us

Similar Papers

More From: JVS-Vascular Insights

Lead the way for us

Similar Papers

Performance of Large Language Models on a Neurology Board–Style Examination
Marc Cicero Schubert ... Varun Venkataramani
JAMA network open | VOL. 6
Marc Cicero Schubert, et. al.Marc Cicero Schubert ... Varun Venkataramani
07 Dec 2023
JAMA network open | VOL. 6

Evaluating the Performance of Large Language Models in Hematopoietic Stem Cell Transplantation Decision Making
Ivan Civettini ... Paola Perfetti
Blood | VOL. 142
Ivan Civettini, et. al.Ivan Civettini ... Paola Perfetti
02 Nov 2023
Blood | VOL. 142

Generative AI enhanced with NCCN clinical practice guidelines for clinical decision support: A case study on bone cancer.
Yanshan Wang ... Xizhi Wu
Journal of Clinical Oncology | VOL. 42
Yanshan Wang, et. al.Yanshan Wang ... Xizhi Wu
01 Jun 2024
Journal of Clinical Oncology | VOL. 42

Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.
Miltiadis A Makrygiannakis ... Eleftherios G Kaklamanos
European Journal of Orthodontics | VOL. -
Miltiadis A Makrygiannakis, et. al.Miltiadis A Makrygiannakis ... Eleftherios G Kaklamanos
13 Apr 2024
European Journal of Orthodontics | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The performance of large language models on fictional consult queries indicates favorable potential for AI-assisted vascular surgery consult handling

Abstract

Talk to us

Similar Papers

More From: JVS-Vascular Insights