Correct Responses Research Articles

This research explores the capabilities of ChatGPT-4 in passing the American Board of Family Medicine (ABFM) Certification Examination. Addressing a gap in existing literature, where earlier artificial intelligence (AI) models showed limitations in medical board examinations, this study evaluates the enhanced features and potential of ChatGPT-4, especially in document analysis and information synthesis. The primary goal is to assess whether ChatGPT-4, when provided with extensive preparation resources and when using sophisticated data analysis, can achieve a score equal to or above the passing threshold for the Family Medicine Board Examinations. In this study, ChatGPT-4 was embedded in a specialized subenvironment, "AI Family Medicine Board Exam Taker," designed to closely mimic the conditions of the ABFM Certification Examination. This subenvironment enabled the AI to access and analyze a range of relevant study materials, including a primary medical textbook and supplementary web-based resources. The AI was presented with a series of ABFM-type examination questions, reflecting the breadth and complexity typical of the examination. Emphasis was placed on assessing the AI's ability to interpret and respond to these questions accurately, leveraging its advanced data processing and analysis capabilities within this controlled subenvironment. In our study, ChatGPT-4's performance was quantitatively assessed on 300 practice ABFM examination questions. The AI achieved a correct response rate of 88.67% (95% CI 85.08%-92.25%) for the Custom Robot version and 87.33% (95% CI 83.57%-91.10%) for the Regular version. Statistical analysis, including the McNemar test (P=.45), indicated no significant difference in accuracy between the 2 versions. In addition, the chi-square test for error-type distribution (P=.32) revealed no significant variation in the pattern of errors across versions. These results highlight ChatGPT-4's capacity for high-level performance and consistency in responding to complex medical examination questions under controlled conditions. The study demonstrates that ChatGPT-4, particularly when equipped with specialized preparation and when operating in a tailored subenvironment, shows promising potential in handling the intricacies of medical board examinations. While its performance is comparable with the expected standards for passing the ABFM Certification Examination, further enhancements in AI technology and tailored training methods could push these capabilities to new heights. This exploration opens avenues for integrating AI tools such as ChatGPT-4 in medical education and assessment, emphasizing the importance of continuous advancement and specialized training in medical applications of AI.

The creation of large language models (LLMs) such as ChatGPT is an important step in the development of artificial intelligence, which shows great potential in medical education due to its powerful language understanding and generative capabilities. The purpose of this study was to quantitatively evaluate and comprehensively analyze ChatGPT's performance in handling questions for the National Nursing Licensure Examination (NNLE) in China and the United States, including the National Council Licensure Examination for Registered Nurses (NCLEX-RN) and the NNLE. This study aims to examine how well LLMs respond to the NCLEX-RN and the NNLE multiple-choice questions (MCQs) in various language inputs. To evaluate whether LLMs can be used as multilingual learning assistance for nursing, and to assess whether they possess a repository of professional knowledge applicable to clinical nursing practice. First, we compiled 150 NCLEX-RN Practical MCQs, 240 NNLE Theoretical MCQs, and 240 NNLE Practical MCQs. Then, the translation function of ChatGPT 3.5 was used to translate NCLEX-RN questions from English to Chinese and NNLE questions from Chinese to English. Finally, the original version and the translated version of the MCQs were inputted into ChatGPT 4.0, ChatGPT 3.5, and Google Bard. Different LLMs were compared according to the accuracy rate, and the differences between different language inputs were compared. The accuracy rates of ChatGPT 4.0 for NCLEX-RN practical questions and Chinese-translated NCLEX-RN practical questions were 88.7% (133/150) and 79.3% (119/150), respectively. Despite the statistical significance of the difference (P=.03), the correct rate was generally satisfactory. Around 71.9% (169/235) of NNLE Theoretical MCQs and 69.1% (161/233) of NNLE Practical MCQs were correctly answered by ChatGPT 4.0. The accuracy of ChatGPT 4.0 in processing NNLE Theoretical MCQs and NNLE Practical MCQs translated into English was 71.5% (168/235; P=.92) and 67.8% (158/233; P=.77), respectively, and there was no statistically significant difference between the results of text input in different languages. ChatGPT 3.5 (NCLEX-RN P=.003, NNLE Theoretical P<.001, NNLE Practical P=.12) and Google Bard (NCLEX-RN P<.001, NNLE Theoretical P<.001, NNLE Practical P<.001) had lower accuracy rates for nursing-related MCQs than ChatGPT 4.0 in English input. English accuracy was higher when compared with ChatGPT 3.5's Chinese input, and the difference was statistically significant (NCLEX-RN P=.02, NNLE Practical P=.02). Whether submitted in Chinese or English, the MCQs from the NCLEX-RN and NNLE demonstrated that ChatGPT 4.0 had the highest number of unique correct responses and the lowest number of unique incorrect responses among the 3 LLMs. This study, focusing on 618 nursing MCQs including NCLEX-RN and NNLE exams, found that ChatGPT 4.0 outperformed ChatGPT 3.5 and Google Bard in accuracy. It excelled in processing English and Chinese inputs, underscoring its potential as a valuable tool in nursing education and clinical decision-making.

Correct Responses Research Articles

Related Topics

Articles published on Correct Responses

Evaluation of a Large Language Model on the American Academy of Pediatrics' PREP Emergency Medicine Question Bank

Evaluation of a distribution, education and awareness intervention for child passenger safety in Lebanon: a low-income and middle-income country setting

Developmental patterns of inhibition and fronto-basal-ganglia white matter organisation in healthy children and children with attention-deficit/hyperactivity disorder.

Knowledge, attitude and perception of Italian dental students toward HPV-related oropharyngeal cancer and vaccination: a cross-sectional study

Emergency imaging protocols for pregnant patients: a multi-institutional and multi- specialty comparison of physician education.

Toward diversification of acute stressors and precision stress research: A stage 2 Registered Report validating a reward-salient stress task in emerging adults.

Radiation information, protection and awareness of healthcare professionals in terms of occupational health and safety: a two–center survey study

Experience of caring for patients with COVID-19 and educational achievement among Japanese resident physicians: a nationwide survey with general medicine in-training examination

Reliability of the Revised Neurophysiology of Pain Questionnaire-Turkish in patients with neck pain: a cross-validation study

Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.

The effect of knowledge and person-related factors on breast cancer susceptibility genes (BRCA1/2) testing perception in Turkish women.

Survey of Knowledge, Attitudes, and Levels of Confidence Regarding Age-Related Hyperkyphosis and Its Management among Thai Physiotherapists.

Awareness of chronic hepatitis b and its barriers to treatment in patient with HBV infection.

Empatica E4 Assessment of Child Physiological Measures of Listening Effort During Remote and In-Person Communication.

Implementing the service-learning methodology in nursing education: A case study

Eye-tracking training improves visuospatial working memory of children with attention-deficit/hyperactivity disorder and autism spectrum disorder.

Are faculty members capable of finding the needed information? A study of selected universities of Jammu and Kashmir

Field-Testing Multiple-Choice Questions With AI Examinees: English Grammar Items

Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study.

Performance of ChatGPT on Hand Surgery Board-Style Examination Questions

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Correct Responses Research Articles

Related Topics

Articles published on Correct Responses

Evaluation of a Large Language Model on the American Academy of Pediatrics' PREP Emergency Medicine Question Bank

Evaluation of a distribution, education and awareness intervention for child passenger safety in Lebanon: a low-income and middle-income country setting

Developmental patterns of inhibition and fronto-basal-ganglia white matter organisation in healthy children and children with attention-deficit/hyperactivity disorder.

Knowledge, attitude and perception of Italian dental students toward HPV-related oropharyngeal cancer and vaccination: a cross-sectional study

Emergency imaging protocols for pregnant patients: a multi-institutional and multi- specialty comparison of physician education.

Toward diversification of acute stressors and precision stress research: A stage 2 Registered Report validating a reward-salient stress task in emerging adults.

Radiation information, protection and awareness of healthcare professionals in terms of occupational health and safety: a two–center survey study

Experience of caring for patients with COVID-19 and educational achievement among Japanese resident physicians: a nationwide survey with general medicine in-training examination

Reliability of the Revised Neurophysiology of Pain Questionnaire-Turkish in patients with neck pain: a cross-validation study

Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.

The effect of knowledge and person-related factors on breast cancer susceptibility genes (BRCA1/2) testing perception in Turkish women.

Survey of Knowledge, Attitudes, and Levels of Confidence Regarding Age-Related Hyperkyphosis and Its Management among Thai Physiotherapists.

Awareness of chronic hepatitis b and its barriers to treatment in patient with HBV infection.

Empatica E4 Assessment of Child Physiological Measures of Listening Effort During Remote and In-Person Communication.

Implementing the service-learning methodology in nursing education: A case study

Eye-tracking training improves visuospatial working memory of children with attention-deficit/hyperactivity disorder and autism spectrum disorder.

Are faculty members capable of finding the needed information? A study of selected universities of Jammu and Kashmir

Field-Testing Multiple-Choice Questions With AI Examinees: English Grammar Items

Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study.

Performance of ChatGPT on Hand Surgery Board-Style Examination Questions