In this study, we compared the performance of ChatGPT-3.5 to that of ChatGPT-4o in the context of the Japanese National Dental Examination, which assesses clinical reasoning skills and dental knowledge, to determine their potential usefulness in dental education. ChatGPT's performance was assessed using 1399 (55% of the exam) of 2520 questions from the Japanese National Dental Examinations (111-117). The 1121 excluded questions (45% of the exam) contained figures or tables that ChatGPT could not recognize. The questions were categorized into 18 different subjects based on dental specialty. Statistical analysis was performed using SPSS software, with McNemar's test applied to assess differences in performance. A significant improvement was noted in the percentage of correct answers from ChatGPT-4o (84.63%) compared with those from ChatGPT-3.5 (45.46%), demonstrating enhanced reliability and subject knowledge. ChatGPT-4o consistently outperformed ChatGPT-3.5 across all dental subjects, with significant improvements in subjects such as oral surgery, pathology, pharmacology, and microbiology. Heatmap analysis revealed that ChatGPT-4o provided more stable and higher correct answer rates, especially for complex subjects. This study found that advanced natural language processing models, such as ChatGPT-4o, potentially have sufficiently advanced clinical reasoning skills and dental knowledge to function as a supplementary tool in dental education and exam preparation.
Read full abstract