Medical Knowledge Test Research Articles

Large language model (LLM)-based chatbots are evolving at an unprecedented pace with the release of ChatGPT, specifically GPT-3.5, and its successor, GPT-4. Their capabilities in general-purpose tasks and language generation have advanced to the point of performing excellently on various educational examination benchmarks, including medical knowledge tests. Comparing the performance of these 2 LLM models to that of Family Medicine residents on a multiple-choice medical knowledge test can provide insights into their potential as medical education tools. This study aimed to quantitatively and qualitatively compare the performance of GPT-3.5, GPT-4, and Family Medicine residents in a multiple-choice medical knowledge test appropriate for the level of a Family Medicine resident. An official University of Toronto Department of Family and Community Medicine Progress Test consisting of multiple-choice questions was inputted into GPT-3.5 and GPT-4. The artificial intelligence chatbot's responses were manually reviewed to determine the selected answer, response length, response time, provision of a rationale for the outputted response, and the root cause of all incorrect responses (classified into arithmetic, logical, and information errors). The performance of the artificial intelligence chatbots were compared against a cohort of Family Medicine residents who concurrently attempted the test. GPT-4 performed significantly better compared to GPT-3.5 (difference 25.0%, 95% CI 16.3%-32.8%; McNemar test: P<.001); it correctly answered 89/108 (82.4%) questions, while GPT-3.5 answered 62/108 (57.4%) questions correctly. Further, GPT-4 scored higher across all 11 categories of Family Medicine knowledge. In 86.1% (n=93) of the responses, GPT-4 provided a rationale for why other multiple-choice options were not chosen compared to the 16.7% (n=18) achieved by GPT-3.5. Qualitatively, for both GPT-3.5 and GPT-4 responses, logical errors were the most common, while arithmetic errors were the least common. The average performance of Family Medicine residents was 56.9% (95% CI 56.2%-57.6%). The performance of GPT-3.5 was similar to that of the average Family Medicine resident (P=.16), while the performance of GPT-4 exceeded that of the top-performing Family Medicine resident (P<.001). GPT-4 significantly outperforms both GPT-3.5 and Family Medicine residents on a multiple-choice medical knowledge test designed for Family Medicine residents. GPT-4 provides a logical rationale for its response choice, ruling out other answer choices efficiently and with concise justification. Its high degree of accuracy and advanced reasoning capabilities facilitate its potential applications in medical education, including the creation of exam questions and scenarios as well as serving as a resource for medical knowledge or information on community services.

To examine the effectiveness of an asynchronous learning tool consisting of web-based lectures for trainees covering major topics pertinent to pediatric emergency medicine (PEM) and to assess resident and student evaluation of this mode of education. PEM faculty and fellows created a 21-lecture, web-based curriculum. These 20-minute online lectures used Microsoft PowerPoint with the voice-over feature. A 75-question test was created to assess the effectiveness of the web-based learning model, administered online before and after the rotation in the pediatric emergency department (PED). All fourth-year medical students and residents (across all specialties) rotating through the PED were required to complete 10 of the 21 lectures during their 1-month rotation. The main outcome variable was difference in score between pre- and post-rotation tests of participants who viewed no lectures and those who viewed at least one lecture. Evaluation of the program was assessed by anonymous survey using 5-point discrete visual analog scales. Responses of 4 or 5 were considered positive for analysis. One hundred eleven residents and fourth-year medical students participated in the program. An initial 32 completed testing before implementation of the on-line lectures (March 2007-August 2007), and another five did not complete the on-line lectures after implementation (September 2007-February 2008). Seventy-one completed testing and on-line lectures, and all but three completed at least 10 on-line lectures during their rotation. Fourteen of 111 trainees did not complete the pre- or post-test (including two who viewed the lectures). The mean change in score was a 1% improvement from pre-test to post-test for trainees who viewed no lectures and a 6.2% improvement for those who viewed the lectures (mean difference = 5.2%, 95% confidence interval = 2.5% to 7.9%). In the linear regression model, the estimate of the coefficient was 0.43 (p < 0.001), meaning that, for each lecture viewed, post-test score rose by 0.43%. Sixty-nine of 75 test items (92%) had a point biserial correlation greater than 0.15. Thirty of the 72 trainees who completed the online lectures and testing (42%) returned surveys. All were comfortable using the Internet, and 87% (26/30) found the web-site easy to use. All felt that their educational goals were met, and 100% felt that the format would be useful in other areas of education. Although not a replacement for traditional bedside teaching, the use of web-based lectures as an asynchronous learning tool has a positive effect on medical knowledge test scores. Trainees were able to view online lectures on their own schedules, in the location of their choice. This is helpful in a field with shift work, in which trainees rarely work together, making it difficult to synchronously provide lectures to all trainees.

Medical Knowledge Test Research Articles

Related Topics

Articles published on Medical Knowledge Test

Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study.

Cognitive perspectives on maintaining physicians' medical expertise: II. Acquiring, maintaining, and updating cognitive skills.

Knowledge of pressure injury in medical and surgical nurses in a tertiary level hospital: A cross-sectional study

A Validity Framework for Effective Analysis and Interpretation of Milestones Data.

Two Indias: The structure of primary health care markets in rural Indian villages with implications for policy

Using an Adaptive, Self-Directed Web-Based Learning Module to Enhance Residents' Medical Knowledge Prior to a New Clinical Rotation.

Resident Competency and Proficiency in Combined Spinal-Epidural Catheter Placement Is Improved Using a Computer-Enhanced Visual Learning Program: A Randomized Controlled Trial.

A holistic remediation intervention for struggling undergraduate medical students affords sustained Progress Test performance recovery

An analysis of reflective writing early in the medical curriculum: The relationship between reflective capacity and academic achievement

A model of cardiopulmonary bypass staged training integrating technical and non-technical skills dedicated to cardiac trainees.

The survey on disaster medical rescue capacity of medical staffs in urban community health service centers of Jilin province

Using automatic item generation to create multiple-choice test items

P03.01. Characteristics of residents and training sites influence successful completion of the Integrative Medicine in Residency program

Integrative Medicine in Residency Education: Developing Competency Through Online Curriculum Training

University of New Mexico School of Medicine

Evaluation of a Web‐based Asynchronous Pediatric Emergency Medicine Learning Tool for Residents and Medical Students

Are multiple choice tests fair to medical students with specific learning disabilities?

Design, Implementation, and Outcome of a Hands-on Arthrocentesis Workshop

Quality of care evaluation in France

Didactic Teaching Conferences for IM Residents

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Medical Knowledge Test Research Articles

Related Topics

Articles published on Medical Knowledge Test

Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study.

Cognitive perspectives on maintaining physicians' medical expertise: II. Acquiring, maintaining, and updating cognitive skills.

Knowledge of pressure injury in medical and surgical nurses in a tertiary level hospital: A cross-sectional study

A Validity Framework for Effective Analysis and Interpretation of Milestones Data.

Two Indias: The structure of primary health care markets in rural Indian villages with implications for policy

Using an Adaptive, Self-Directed Web-Based Learning Module to Enhance Residents' Medical Knowledge Prior to a New Clinical Rotation.

Resident Competency and Proficiency in Combined Spinal-Epidural Catheter Placement Is Improved Using a Computer-Enhanced Visual Learning Program: A Randomized Controlled Trial.

A holistic remediation intervention for struggling undergraduate medical students affords sustained Progress Test performance recovery

An analysis of reflective writing early in the medical curriculum: The relationship between reflective capacity and academic achievement

A model of cardiopulmonary bypass staged training integrating technical and non-technical skills dedicated to cardiac trainees.

The survey on disaster medical rescue capacity of medical staffs in urban community health service centers of Jilin province

Using automatic item generation to create multiple-choice test items

P03.01. Characteristics of residents and training sites influence successful completion of the Integrative Medicine in Residency program

Integrative Medicine in Residency Education: Developing Competency Through Online Curriculum Training

University of New Mexico School of Medicine

Evaluation of a Web‐based Asynchronous Pediatric Emergency Medicine Learning Tool for Residents and Medical Students

Are multiple choice tests fair to medical students with specific learning disabilities?

Design, Implementation, and Outcome of a Hands-on Arthrocentesis Workshop

Quality of care evaluation in France

Didactic Teaching Conferences for IM Residents