Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks

Sarah Sandmann,Sarah Riepenhausen,Lucas Plagwitz,Julian Varghese

doi:10.1038/s41467-024-46411-8

Sarah Sandmann, Sarah Riepenhausen + Show 2 more

Open Access

https://doi.org/10.1038/s41467-024-46411-8

Copy DOI

Journal: Nature Communications	Publication Date: Mar 6, 2024
Citations: 14	License type: CC BY 4.0

Affiliation: University of Münster

Abstract

It is likely that individuals are turning to Large Language Models (LLMs) to seek health advice, much like searching for diagnoses on Google. We evaluate clinical accuracy of GPT-3·5 and GPT-4 for suggesting initial diagnosis, examination steps and treatment of 110 medical cases across diverse clinical disciplines. Moreover, two model configurations of the Llama 2 open source LLMs are assessed in a sub-study. For benchmarking the diagnostic task, we conduct a naïve Google search for comparison. Overall, GPT-4 performed best with superior performances over GPT-3·5 considering diagnosis and examination and superior performance over Google for diagnosis. Except for treatment, better performance on frequent vs rare diseases is evident for all three approaches. The sub-study indicates slightly lower performances for Llama models. In conclusion, the commercial LLMs show growing potential for medical question answering in two successive major releases. However, some weaknesses underscore the need for robust and regulated AI models in health care. Open source LLMs can be a viable option to address specific needs regarding data privacy and transparency of training.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks

Abstract

Talk to us

Similar Papers

More From: Nature Communications

Lead the way for us

Similar Papers

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Large language models in health care: Development, applications, and challenges
Rui Yang ... Nan Liu
Health Care Science | VOL. 2
Rui Yang, et. al.Rui Yang ... Nan Liu
24 Jul 2023
Health Care Science | VOL. 2

Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.
Leyao Wang ... Zhijun Yin
Journal of medical Internet research | VOL. 26
Leyao Wang, et. al.Leyao Wang ... Zhijun Yin
07 Nov 2024
Journal of medical Internet research | VOL. 26

Potential of Large Language Models in Health Care: Delphi Study.
Kerstin Denecke ... Vicente Llmhealthgroup
Journal of medical Internet research | VOL. 26
Kerstin Denecke, et. al.Kerstin Denecke ... Vicente Llmhealthgroup
13 May 2024
Journal of medical Internet research | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks

Abstract

Talk to us

Similar Papers

More From: Nature Communications