How well do large language model-based chatbots perform in oral and maxillofacial radiology?

Hui Jeong,Sang-Sun Han,Youngjae Yu,Saejin Kim,Kug Jin Jeon

doi:10.1093/dmfr/twae021

Abstract

This study evaluated the performance of four large language model (LLM)-based chatbots by comparing their test results with those of dental students on an oral and maxillofacial radiology examination. ChatGPT, ChatGPT Plus, Bard, and Bing Chat were tested on 52 questions from regular dental college examinations. These questions were categorized into three educational content areas: basic knowledge, imaging and equipment, and image interpretation. They were also classified as multiple-choice questions (MCQs) and short-answer questions (SAQs). The accuracy rates of the chatbots were compared with the performance of students, and further analysis was conducted based on the educational content and question type. The students' overall accuracy rate was 81.2%, while that of the chatbots varied: 50.0% for ChatGPT, 65.4% for ChatGPT Plus, 50.0% for Bard, and 63.5% for Bing Chat. ChatGPT Plus achieved a higher accuracy rate for basic knowledge than the students (93.8% vs. 78.7%). However, all chatbots performed poorly in image interpretation, with accuracy rates below 35.0%. All chatbots scored less than 60.0% on MCQs, but performed better on SAQs. The performance of chatbots in oral and maxillofacial radiology was unsatisfactory. Further training using specific, relevant data derived solely from reliable sources is required. Additionally, the validity of these chatbots' responses must be meticulously verified.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

How well do large language model-based chatbots perform in oral and maxillofacial radiology?

Abstract

Talk to us

Similar Papers

More From: Dento maxillo facial radiology

Lead the way for us

Journal: Dento maxillo facial radiology	Publication Date: Jun 7, 2024
License type: CC BY 4.0

Similar Papers

Retrieval practice with short-answer, multiple-choice, and hybrid tests
Megan A Smith ... Jeffrey D Karpicke
Memory | VOL. 22
Megan A Smith, et. al.Megan A Smith ... Jeffrey D Karpicke
23 Sep 2013
Memory | VOL. 22

Use of multiple choice questions during lectures helps medical students improve their performance in written formative assessment in physiology
Madhu Bhatt ... Shelka Dua
National Journal of Physiology, Pharmacy and Pharmacology | VOL. 6
Madhu Bhatt, et. al.Madhu Bhatt ... Shelka Dua
01 Jan 2015
National Journal of Physiology, Pharmacy and Pharmacology | VOL. 6

Open problem-based instruction impacts understanding of physiological concepts differently in undergraduate students.
Brandon M Franklin ... Megan K Rhoads
Advances in physiology education | VOL. 39
Brandon M Franklin, et. al.Brandon M Franklin ... Megan K Rhoads
01 Dec 2015
Advances in physiology education | VOL. 39

A new methodology for comparison of three-test exam techniques in medical students

Academic Medicine | VOL. 5

01 Jan 2004
Academic Medicine | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

How well do large language model-based chatbots perform in oral and maxillofacial radiology?

Abstract

Talk to us

Similar Papers

More From: Dento maxillo facial radiology