Evaluating the Current Ability of ChatGPT to Assist in Professional Otolaryngology Education.

Habib G Zalzal,Jenhao Cheng,Rahul K Shah

doi:10.1002/oto2.94

Abstract

To quantify ChatGPT's concordance with expert Otolaryngologists when posed with high-level questions that require blending rote memorization and critical thinking. Cross-sectional survey. OpenAI's ChatGPT-3.5 Platform. Two board-certified otolaryngologists (HZ, RS) input 2 sets of 30 text-based questions (open-ended and single-answer multiple-choice) into the ChatGPT-3.5 model. Responses were rated on a scale (correct, partially correct, incorrect) by each Otolaryngologist working simultaneously with the AI model. Interrater agreement percentage was based on binomial distribution for calculating the 95% confidence intervals and performing significance tests. Statistical significance was defined as P < .05 for 2-sided tests. In testing open-ended questions, the ChatGPT model had 56.7% of initially answering questions with complete accuracy, and 86.7% chance of answer with some accuracy (corrected agreement = 80.1%; P < .001). For repeat questions, ChatGPT improved to 73.3% with complete accuracy and 96.7% with some accuracy (corrected agreement = 88.8%; P < .001). For multiple-choice questions, the ChatGPT model performed substantially worse (43.3% correct). ChatGPT currently does not provide reliably accurate responses to sophisticated questions in Otolaryngology. Professional societies must be aware of the potential of this tool and prevent unscrupulous use during test-taking situations and consider guidelines for clinical scenarios. Expert clinical oversight is still necessary for myriad use cases (eg, hallucination).

Full Text