Performance of GPT-4 on Chinese Nursing Examination: Potentials for AI-Assisted Nursing Education Using Large Language Models.

Yiqun Miao,Yuan Luo,Yuhan Zhao,Jiawei Li,Mingxuan Liu,Huiying Wang,Yuling Chen,Ying Wu

doi:10.1097/nne.0000000000001679

Abstract

The performance of GPT-4 in nursing examinations within the Chinese context has not yet been thoroughly evaluated. To assess the performance of GPT-4 on multiple-choice and open-ended questions derived from nursing examinations in the Chinese context. The data sets of the Chinese National Nursing Licensure Examination spanning 2021 to 2023 were used to evaluate the accuracy of GPT-4 in multiple-choice questions. The performance of GPT-4 on open-ended questions was examined using 18 case-based questions. For multiple-choice questions, GPT-4 achieved an accuracy of 71.0% (511/720). For open-ended questions, the responses were evaluated for cosine similarity, logical consistency, and information quality, all of which were found to be at a moderate level. GPT-4 performed well at addressing queries on basic knowledge. However, it has notable limitations in answering open-ended questions. Nursing educators should weigh the benefits and challenges of GPT-4 for integration into nursing education.

Full Text