Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists.

Che-Sheng Chu,Chih-Sung Liang,Chih-Wei Hsu,Dian-Jeng Li,Kuan-Pin Su,Shih-Jen Tsai,Szu-Wei Cheng,Ta-Chuan Yeh,Tien-Wei Hsu,Ya-Mei Bai,Yu-Chen Kao

doi:10.1111/pcn.13656

Abstract

Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well-studied. In the first step, we compared the performance of ChatGPT GPT-4, Bard, and Llama-2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis. Only GPT-4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and ≥ 60 being considered a passing grade), while Bard scored 36 and Llama-2 scored 25. GPT-4 outperformed Bard and Llama-2, especially in the areas of 'Pathophysiology & Epidemiology' (χ2 = 22.4, P < 0.001) and 'Psychopharmacology & Other therapies' (χ2 = 15.8, P < 0.001). In the differential diagnosis, the mean score of the 24 experienced psychiatrists (mean 6.1, standard deviation 1.9) was higher than that of GPT-4 (5), Bard (3), and Llama-2 (1). Compared to Bard and Llama-2, GPT-4 demonstrated superior abilities in identifying psychiatric symptoms and making clinical judgments. Besides, GPT-4's ability for differential diagnosis closely approached that of the experienced psychiatrists. GPT-4 revealed a promising potential as a valuable tool in psychiatric practice among the three LLMs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists.

Abstract

Talk to us

Similar Papers

More From: Psychiatry and clinical neurosciences

Lead the way for us

Journal: Psychiatry and clinical neurosciences	Publication Date: Feb 26, 2024
Citations: 7

Similar Papers

ChatGPT & clinical decision-making: a cross-sectional study on Italian Medical Residency Test
F Conrado ... G Scaioli
European Journal of Public Health | VOL. 33
F Conrado, et. al.F Conrado ... G Scaioli
24 Oct 2023
European Journal of Public Health | VOL. 33

The Application of Large Language Models in Gastroenterology: A Review of the Literature.
Marcello Maida ... Daryl Ramai
Cancers | VOL. 16
Marcello Maida, et. al.Marcello Maida ... Daryl Ramai
28 Sep 2024
Cancers | VOL. 16

Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions.
Ali Abbas ... Mahad S Rehman
Cureus | VOL. 16
Ali Abbas, et. al.Ali Abbas ... Mahad S Rehman
11 Mar 2024
Cureus | VOL. 16

A systematic review of large language models and their implications in medical education.
Harrison C Lucas ... Jamie R Robinson
Medical education | VOL. 58
Harrison C Lucas, et. al.Harrison C Lucas ... Jamie R Robinson
19 Apr 2024
Medical education | VOL. 58

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists.

Abstract

Talk to us

Similar Papers

More From: Psychiatry and clinical neurosciences