The Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Antidiabetic Medication under Conditions of Clinical Uncertainty.

James H Flory,Jessica S Ancker,Scott Y H Kim,Gilad Kuperman,Aleksandr Petrov,Andrew Vickers

doi:10.2337/dc24-1067

Abstract

To explore how the commercially available large language model (LLM) GPT-4 compares to endocrinologists when addressing medical questions when there is uncertainty regarding the best answer. This study compared responses from GPT-4 to responses from 31 endocrinologists using hypothetical clinical vignettes focused on diabetes, specifically examining the prescription of metformin versus alternative treatments. The primary outcome was the choice between metformin and other treatments. With a simple prompt, GPT-4 chose metformin in 12% (95% CI 7.9-17%) of responses, compared with 31% (95% CI 23-39%) of endocrinologist responses. After modifying the prompt to encourage metformin use, the selection of metformin by GPT-4 increased to 25% (95% CI 22-28%). GPT-4 rarely selected metformin in patients with impaired kidney function, or a history of gastrointestinal distress (2.9% of responses, 95% CI 1.4-5.5%). In contrast, endocrinologists often prescribed metformin even in patients with a history of gastrointestinal distress (21% of responses, 95% CI 12-36%). GPT-4 responses showed low variability on repeated runs except at intermediate levels of kidney function. In clinical scenarios with no single right answer, GPT-4's responses were reasonable, but differed from endocrinologists' responses in clinically important ways. Value judgments are needed to determine when these differences should be addressed by adjusting the model. We recommend against reliance on LLM output until it is shown to align not just with clinical guidelines but also with patient and clinician preferences, or it demonstrates improvement in clinical outcomes over standard of care.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Antidiabetic Medication under Conditions of Clinical Uncertainty.

Abstract

Talk to us

Similar Papers

More From: Diabetes care

Lead the way for us

Similar Papers

<b>The Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Antidiabetic Medication Under Conditions of Clinical Uncertainty</b>
James H Flory ... Andrew Vickers
-
James H Flory, et. al.James H Flory ... Andrew Vickers
09 Sep 2024
09 Sep 2024

<b>The Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Antidiabetic Medication Under Conditions of Clinical Uncertainty</b>
James H Flory ... Andrew Vickers
-
James H Flory, et. al.James H Flory ... Andrew Vickers
09 Sep 2024
09 Sep 2024

High-flux hemodialysis after administering high-dose methotrexate in a patient with posttransplant lymphoproliferative disease and impaired renal function.
Alexander Reshetnik ... Christian Scheurig‐Muenkler
Clinical case reports | VOL. 3
Alexander Reshetnik, et. al.Alexander Reshetnik ... Christian Scheurig‐Muenkler
25 Sep 2015
Clinical case reports | VOL. 3

The impact of the combination of kidney and physical function on cognitive decline over 2years in older adults with pre-dialysis chronic kidney disease.
Yuhei Otobe ... Yugo Shibagaki
Clinical and experimental nephrology | VOL. 23
Yuhei Otobe, et. al.Yuhei Otobe ... Yugo Shibagaki
08 Feb 2019
Clinical and experimental nephrology | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Antidiabetic Medication under Conditions of Clinical Uncertainty.

Abstract

Talk to us

Similar Papers

More From: Diabetes care