Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

Soshi Takagi,Kota Sakaguchi,Takashi Watari,Ayano Erabi

doi:10.2196/48002

Abstract

Background The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied. Objective This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages. Methods This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions. Results The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages. Conclusions GPT-4 could become a valuable tool for medical education and clinical support in non–English-speaking regions, such as Japan.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR Medical Education	Publication Date: Jun 29, 2023
Citations: 111	License type: cc-by

R Discovery Prime

R Discovery Prime

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

Abstract

Talk to us

Similar Papers

More From: JMIR Medical Education

Lead the way for us

Similar Papers

Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study.
Takashi Watari ... Yu Yamamoto
JMIR medical education | VOL. 9
Takashi Watari, et. al.Takashi Watari ... Yu Yamamoto
06 Dec 2023
JMIR medical education | VOL. 9

The Use of Task-Evoked Pupillary Response as an Objective Measure of Cognitive Load in Novices and Trained Physicians: A New Tool for the Assessment of Expertise.
Adam Szulewski ... Daniel Howes
Academic Medicine | VOL. 90
Adam Szulewski, et. al.Adam Szulewski ... Daniel Howes
01 Jul 2015
Academic Medicine | VOL. 90

Digital Health Transformers and Opportunities for Artificial Intelligence-Enabled Nephrology.
Benjamin Shickel ... Tezcan Ozrazgat-Baslanti
Clinical Journal of the American Society of Nephrology | VOL. 18
Benjamin Shickel, et. al.Benjamin Shickel ... Tezcan Ozrazgat-Baslanti
09 Feb 2023
Clinical Journal of the American Society of Nephrology | VOL. 18

Awareness of and Attitude towards Learning of Non-english foreign Languages among Higher Secondary, Graduation and Post-graduation Students in the City of Kolkata - An Empirical Study
...
-
, et. al. ...
01 Mar 2013
01 Mar 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

Abstract

Talk to us

Similar Papers

More From: JMIR Medical Education