Multiple-choice Examination Research Articles

Large language models (LLMs) have shown promise in their performance on both multiple-choice and open-ended medical reasoning examinations, but it remains unknown whether the use of such tools improves physician diagnostic reasoning. To assess the effect of an LLM on physicians' diagnostic reasoning compared with conventional resources. A single-blind randomized clinical trial was conducted from November 29 to December 29, 2023. Using remote video conferencing and in-person participation across multiple academic medical institutions, physicians with training in family medicine, internal medicine, or emergency medicine were recruited. Participants were randomized to either access the LLM in addition to conventional diagnostic resources or conventional resources only, stratified by career stage. Participants were allocated 60 minutes to review up to 6 clinical vignettes. The primary outcome was performance on a standardized rubric of diagnostic performance based on differential diagnosis accuracy, appropriateness of supporting and opposing factors, and next diagnostic evaluation steps, validated and graded via blinded expert consensus. Secondary outcomes included time spent per case (in seconds) and final diagnosis accuracy. All analyses followed the intention-to-treat principle. A secondary exploratory analysis evaluated the standalone performance of the LLM by comparing the primary outcomes between the LLM alone group and the conventional resource group. Fifty physicians (26 attendings, 24 residents; median years in practice, 3 [IQR, 2-8]) participated virtually as well as at 1 in-person site. The median diagnostic reasoning score per case was 76% (IQR, 66%-87%) for the LLM group and 74% (IQR, 63%-84%) for the conventional resources-only group, with an adjusted difference of 2 percentage points (95% CI, -4 to 8 percentage points; P = .60). The median time spent per case for the LLM group was 519 (IQR, 371-668) seconds, compared with 565 (IQR, 456-788) seconds for the conventional resources group, with a time difference of -82 (95% CI, -195 to 31; P = .20) seconds. The LLM alone scored 16 percentage points (95% CI, 2-30 percentage points; P = .03) higher than the conventional resources group. In this trial, the availability of an LLM to physicians as a diagnostic aid did not significantly improve clinical reasoning compared with conventional resources. The LLM alone demonstrated higher performance than both physician groups, indicating the need for technology and workforce development to realize the potential of physician-artificial intelligence collaboration in clinical practice. ClinicalTrials.gov Identifier: NCT06157944.

Read full abstract

Three-dimensional (3D) printed cardiac models are increasingly being used for medical education, simulation and training, communication, surgical planning and research. Given the complexities of congenital cardiac anatomy, 3D printing is well suited as an adjunct to traditional teaching methods. This study aims to explore the influence of 3D printed cardiac models as a teaching aid for nurses and paediatric trainees. We hypothesise that using 3D models as an adjunct to didactic teaching methods improves knowledge and confidence levels of participants, regardless of their cardiology experience. A prospective randomised study was performed recruiting paediatric nurses and doctors at a tertiary paediatric hospital. All participants undertook traditional congenital cardiac teaching describing normal cardiac anatomy and seven congenital lesions of increasing complexity (atrial septal defect, ventricular septal defect, vascular ring, partial anomalous pulmonary venous return, tetralogy of Fallot, transposition of the great arteries, and double outlet right ventricle). The intervention group received an additional recorded demonstration while handling 3D printed models of a normal heart and the same lesions. Pre- and post-intervention assessments were completed using a subjective Likert-scale questionnaire and objective multiple-choice examination. A total of 73 health practitioners (30 cardiac nurses and 43 paediatric trainees) were included. Subjective knowledge and confidence levels substantially improved in the intervention group (both p<0.001), with no differences observed in the control group. Greater improvement in both subjective and objective post-test scores was observed in the intervention group. A pronounced difference between pre- and post-teaching objective examination scores was found in both groups (p=0.002), with larger improvements observed in the intervention group. The mean score in the intervention group after teaching increased by 4.27 (21.4% improvement), as opposed to 2.28 (11.4% improvement) in the control group. There was no difference in pre-test score or post-test improvement based on previous cardiology experience. Three-dimensional (3D) printed cardiac models, when used as an adjunct to traditional teaching methods, substantially improve knowledge and confidence levels of health professionals on a range of congenital cardiac lesions. These models enhance the learners' educational experience and understanding of cardiac anatomy by overcoming the limitation of two-dimensional representations of 3D structures.

Read full abstract

Multiple-choice Examination Research Articles

Related Topics

Articles published on Multiple-choice Examination

Strengthening the Multiple-Choice Assessment: Improving Item-Writing Skills of Physician Assistant Educators.

Large Language Model Influence on Diagnostic Reasoning

Evaluation of Skin of Color Curricular Content for Physician Assistant Education.

Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination

The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions

Use of Multiple-Choice Items in Summative Examinations: Questionnaire Survey Among German Undergraduate Dental Training Programs.

Mathematics self-concept and response pattern in higher education examinations: differences between genders

Developing an educational “hub”: impact of a distance-learning curriculum in a multinational cohort

Android Quiz Application Development

Can ChatGPT pass the MRCP (UK) written examinations? Analysis of performance and errors using a clinical decision-reasoning framework

GPT-4 passes the bar exam.

Development and evaluation of a novel knowledge assessment tool for pediatric emergency medicine clerkships.

Beyond multiple choice: Clinical simulation as a rigorous and inclusive method for assessing genetic counseling competencies.

A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology-Head and Neck Surgery Certification Examinations: Performance Study.

The effect of the attitude towards risk/ambiguity on examination grades: cross-sectional study in a Portuguese medical school

Contribution of critical thinking, science process skills towards learning outcomes based on gender

3D Printed Cardiac Models as an Adjunct to Traditional Teaching of Anatomy in Congenital Heart Disease—A Randomised Controlled Study

A Team-based Learning Approach During Pediatric Clerkship to Promote Clinical Reasoning.

Comparing the Learning Outcomes of Online versus Face-to-Face Formats in Teaching a Marketing Course: An Empirical Observation

Gender bias in first-year multiple-choice physics examinations

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multiple-choice Examination Research Articles

Related Topics

Articles published on Multiple-choice Examination

Strengthening the Multiple-Choice Assessment: Improving Item-Writing Skills of Physician Assistant Educators.

Large Language Model Influence on Diagnostic Reasoning

Evaluation of Skin of Color Curricular Content for Physician Assistant Education.

Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination

The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions

Use of Multiple-Choice Items in Summative Examinations: Questionnaire Survey Among German Undergraduate Dental Training Programs.

Mathematics self-concept and response pattern in higher education examinations: differences between genders

Developing an educational “hub”: impact of a distance-learning curriculum in a multinational cohort

Android Quiz Application Development

Can ChatGPT pass the MRCP (UK) written examinations? Analysis of performance and errors using a clinical decision-reasoning framework

GPT-4 passes the bar exam.

Development and evaluation of a novel knowledge assessment tool for pediatric emergency medicine clerkships.

Beyond multiple choice: Clinical simulation as a rigorous and inclusive method for assessing genetic counseling competencies.

A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology-Head and Neck Surgery Certification Examinations: Performance Study.

The effect of the attitude towards risk/ambiguity on examination grades: cross-sectional study in a Portuguese medical school

Contribution of critical thinking, science process skills towards learning outcomes based on gender

3D Printed Cardiac Models as an Adjunct to Traditional Teaching of Anatomy in Congenital Heart Disease—A Randomised Controlled Study

A Team-based Learning Approach During Pediatric Clerkship to Promote Clinical Reasoning.

Comparing the Learning Outcomes of Online versus Face-to-Face Formats in Teaching a Marketing Course: An Empirical Observation

Gender bias in first-year multiple-choice physics examinations