Correct Answer Rate Research Articles

Purpose Large language models (LLMs) are increasingly employed across various fields, including medicine and dentistry. In the field of dental anesthesiology, LLM is expected to enhance the efficiency of information gathering, patient outcomes, and education. This study evaluates the performance of different LLMs in answering questions from the Japanese Dental Society of Anesthesiology Board Certification Examination (JDSABCE) to determine their utility in dental anesthesiology. Methods The study assessed three LLMs, ChatGPT-4 (OpenAI, San Francisco, California, United States), Gemini 1.0 (Google, Mountain View, California, United States), and Claude 3 Opus (Anthropic, San Francisco, California, United States), using multiple-choice questions from the 2020 to 2022 JDSABCE exams. Each LLM answered these questions three times. The study excluded questions involving figures or deemed inappropriate. The primary outcome was the accuracy rate of each LLM, with secondary analysis focusing on six subgroups: (1) basic physiology necessary for general anesthesia, (2) local anesthesia, (3) sedation and general anesthesia, (4) diseases and patient management methods that pose challenges in systemic management, (5) pain management, and (6) shock and cardiopulmonary resuscitation. Statistical analysis was performed using one-way ANOVA with Dunnett's multiple comparisons, with a significance threshold of p<0.05. Results ChatGPT-4 achieved a correct answer rate of 51.2% (95% CI: 42.78-60.56, p=0.003) and Claude 3 Opus 47.4% (95% CI: 43.45-51.44, p<0.001), both significantly higher than Gemini 1.0, which had a rate of 30.3% (95% CI: 26.53-34.14). In subgroup analyses, ChatGPT-4 and Claude 3 Opus demonstrated superior performance in basic physiology, sedation and general anesthesia, and systemic management challenges compared to Gemini 1.0. Notably, ChatGPT-4 excelled in questions related to systemic management (62.5%) and Claude 3 Opus in pain management (61.53%). Conclusions ChatGPT-4 and Claude 3 Opus exhibit potential for use in dental anesthesiology, outperforming Gemini 1.0. However, their current accuracy rates are insufficient for reliable clinical use.These findings have significant implications for dental anesthesiology practice and education, including educational support, clinical decision support, and continuing education. To enhance LLM utility in dental anesthesiology, it is crucial to increase the availability of high-quality information online and refine prompt engineering to better guide LLM responses.

Read full abstract

Background This study aims to evaluate the performance of ChatGPT in the medical specialization exam (MSE) that medical graduates take when choosing their postgraduate specialization and to reveal how artificial intelligence-supported education can increase the quality and academic success of medical education. The research aims to explore the potential applications and advantages of artificial intelligence in medical education and examine ways in which this technology can contribute to student learning and exam preparation. Methodology A total of 240 MSE questions were posed to ChatGPT, 120 of which were basic medical sciences questions and 120 were clinical medical sciences questions. A total of 18,481 people participated in the exam. The performance of medical school graduates was compared with ChatGPT-3.5 in terms of answering these questions correctly. The average score for ChatGPT-3.5 was calculated by averaging the minimum and maximum scores. Calculations were done using the R.4.0.2 environment. Results The general average score of graduates was a minimum of 7.51 in basic sciences and a maximum of 81.46, while in clinical sciences, the average was a minimum of 12.51 and a maximum of 80.78. ChatGPT, on the other hand, had an average of at least 60.00 in basic sciences and a maximum of 72.00, with an average of at least 66.25 and a maximum of 77.00 in clinical sciences. The rate of correct answers in basic medical sciences for graduates was 43.03%, while for ChatGPT was 60.00%. In clinical medical sciences, the rate of correct answers for graduates was 53.29%, while for ChatGPT was 64.16%. ChatGPT performed best with a 91.66% correct answer rate in Obstetrics and Gynecology and an 86.36% correct answer rate in Medical Microbiology. The least successful area for ChatGPT was Anatomy, with a 28.00% correct answer rate, a subfield of basic medical sciences. Graduates outperformed ChatGPT in the Anatomy and Physiology subfields. Significant differences were found in all comparisons between ChatGPT and graduates. Conclusions This study shows that artificial intelligence models such as ChatGPT can provide significant advantages to graduates, as they score higher than medical school graduates. In terms of these benefits, recommended applications include interactive support, private lessons, learning material production, personalized learning plans, self-assessment, motivation boosting, and 24/7 access, among a variety of benefits. As a result, artificial intelligence-supported education can play an important role in improving the quality of medical education and increasing student success.

Read full abstract

Correct Answer Rate Research Articles

Articles published on Correct Answer Rate

Performance of ChatGPT-3.5 and ChatGPT-4o in the Japanese National Dental Examination.

The effect of calculus and kinematics contexts on students’ understanding of graphs

간호대학생의 표준주의 지식과 인식, 간호전문직관이 감염관리 표준주의 수행도에 미치는 영향

Hypertension management: A comparative analysis of garlic, hibiscus, hawthorn, and olive leaf and a survey

Anti-gingivitis natural products: Aloe vera gel, tea tree oil mouthwash, turmeric gel, myrrh extract/mouthwash

Human T-cell lymphotropic virus: knowledge of academics from a higher education institution

시뮬레이션 기반 평가에서 프로세스 데이터를 활용한 협력적 문제해결력 유형 분석

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.

A Pilot Study of the Computerized Brief Smell Identification Test

Desempenho de escolares em textos lacunados com uso de figuras e palavras

Analysis of Responses of GPT-4V to the Japanese National Clinical Engineer Licensing Examination.

Examining the competence of artificial intelligence programs in neuro-ophthalmological disorders and analyzing their comparative superiority

Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam.

Are Medical Students and Primary Health-care Professionals Aware of Neonatal Cholestasis and Acholic Stool.

Assessment Study of ChatGPT-3.5's Performance on the Final Polish Medical Examination: Accuracy in Answering 980 Questions.

Alignment of Patient Information Leaflets with the Health Literacy Skills of Future End-Users: Are We on the Same Page?

A Comparative Analysis of ChatGPT and Medical Faculty Graduates in Medical Specialization Exams: Uncovering the Potential of Artificial Intelligence in Medical Education.

Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination

Depth estimation of pipe wall thinning using multifrequency reflection coefficients of T(0,1) mode-guided waves with supervised multilayer perceptron

Evaluation of Current Artificial Intelligence Programs on the Knowledge of Glaucoma.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Correct Answer Rate Research Articles

Articles published on Correct Answer Rate

Performance of ChatGPT-3.5 and ChatGPT-4o in the Japanese National Dental Examination.

The effect of calculus and kinematics contexts on students’ understanding of graphs

간호대학생의 표준주의 지식과 인식, 간호전문직관이 감염관리 표준주의 수행도에 미치는 영향

Hypertension management: A comparative analysis of garlic, hibiscus, hawthorn, and olive leaf and a survey

Anti-gingivitis natural products: Aloe vera gel, tea tree oil mouthwash, turmeric gel, myrrh extract/mouthwash

Human T-cell lymphotropic virus: knowledge of academics from a higher education institution

시뮬레이션 기반 평가에서 프로세스 데이터를 활용한 협력적 문제해결력 유형 분석

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.

A Pilot Study of the Computerized Brief Smell Identification Test

Desempenho de escolares em textos lacunados com uso de figuras e palavras

Analysis of Responses of GPT-4V to the Japanese National Clinical Engineer Licensing Examination.

Examining the competence of artificial intelligence programs in neuro-ophthalmological disorders and analyzing their comparative superiority

Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam.

Are Medical Students and Primary Health-care Professionals Aware of Neonatal Cholestasis and Acholic Stool.

Assessment Study of ChatGPT-3.5's Performance on the Final Polish Medical Examination: Accuracy in Answering 980 Questions.

Alignment of Patient Information Leaflets with the Health Literacy Skills of Future End-Users: Are We on the Same Page?

A Comparative Analysis of ChatGPT and Medical Faculty Graduates in Medical Specialization Exams: Uncovering the Potential of Artificial Intelligence in Medical Education.

Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination

Depth estimation of pipe wall thinning using multifrequency reflection coefficients of T(0,1) mode-guided waves with supervised multilayer perceptron

Evaluation of Current Artificial Intelligence Programs on the Knowledge of Glaucoma.