Abstract
Abstract Introduction ChatGPT is a Natural Language Processing (NLP) model trained via machine learning algorithms that has proven capable of passing multiple selective medical tests. The aim of this study is to evaluate its “self-learning” capacity over a 90-day interval in solving specialized medical questions. Methods A comparative descriptive analysis of the problem-solving ability of ChatGPT - 3.5 (10/02/2023 - 15/02/2023) versus ChatGPT - 4 (11/05/2023 - 13/05/2023) was conducted on the competitive exams for the position of Specialist in Thoracic Surgery, as announced by the Andalusian Health Service in 2022. This competitive exam was chosen because of its multiple-choice structure and its division into a theoretical questionnaire and a practical questionnaire. Results Success rate: ChatGPT - 3.5 58.90% (86/146): theoretical questionnaire 63.2% (62/98) and practical questionnaire 50% (24/48) / ChatGPT - 4 65.7% (96/146): theoretical questionnaire 71.43% (70/98) and practical questionnaire 54.16% (26/48). Inferential analysis showed no significant differences (p>0.05) with regard to the rate of correct answers between both versions globally, nor with respect to the theoretical or practical questionnaire. Conclusion Despite the constant “self-learning” of this Artificial Intelligence model, its performance in solving specialized medical questions, which require critical reasoning, remains an ongoing developmental challenge.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have