Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal.

Ryunosuke Noda,Yuto Izaki,Fumiya Kitano,Jun Komatsu,Daisuke Ichikawa,Yugo Shibagaki

doi:10.1007/s10157-023-02451-w

Ryunosuke Noda, Yuto Izaki + Show 4 more

Open Access

https://doi.org/10.1007/s10157-023-02451-w

Copy DOI

Abstract

Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications. Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents. The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p<0.01) and Bard (p<0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents. GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal.

Abstract

Talk to us

Similar Papers

More From: Clinical and experimental nephrology

Lead the way for us

Journal: Clinical and experimental nephrology	Publication Date: Feb 14, 2024
Citations: 6

Similar Papers

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Response to M. Trengove & coll regarding "Attention is not all you need: the complicated case of ethically using large language models in healthcare and medicine".
Stefan Harrer
eBioMedicine | VOL. 93
Stefan HarrerStefan Harrer
01 Jul 2023
eBioMedicine | VOL. 93

ChatGPT Isn't Magic
Tama Leaver ... Suzanne Srdarov
M/C Journal | VOL. 26
Tama Leaver, et. al.Tama Leaver ... Suzanne Srdarov
02 Oct 2023
M/C Journal | VOL. 26

Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations.
Yutaka Igarashi ... Shoji Yokobori
Journal of Nippon Medical School = Nippon Ika Daigaku zasshi | VOL. 91
Yutaka Igarashi, et. al.Yutaka Igarashi ... Shoji Yokobori
25 Apr 2024
Journal of Nippon Medical School = Nippon Ika Daigaku zasshi | VOL. 91

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal.

Abstract

Talk to us

Similar Papers

More From: Clinical and experimental nephrology