IntroductionChatGPT is a sophisticated AI model capable of generating human-like text based on the input it receives. ChatGPT 3.5 showed an inability to pass the FRCS (Tr&Orth) examination due to a lack of higher-order judgement in previous studies. Enhancements in ChatGPT 4.0 warrant an evaluation of its performance. MethodologyQuestions from the UK-based December 2022 In-Training examination were input into ChatGPT 3.5 and 4.0. Methodology from a prior study was replicated to maintain consistency, allowing for a direct comparison between the two model versions. The performance threshold remained at 65.8%, aligning with the November 2022 sitting of Section 1 of the FRCS (Tr&Orth). ResultsChatGPT 4.0 achieved a passing score (73.9%), indicating an improvement in its ability to analyse clinical information and make decisions reflective of a competent trauma and orthopaedic consultant. Compared to ChatGPT 4.0, version 3.5 scored 38.1% lower, which represents a significant difference (p<0.0001; Chi-square). The breakdown by subspecialty further demonstrated version 4.0’s enhanced understanding and application in complex clinical scenarios. ChatGPT 4.0 had a significantly significant improvement in answering image-based questions (p=0.0069) compared to its predecessor. ConclusionChatGPT 4.0's success in passing Section 1 of the FRCS (Tr&Orth) examination highlights the rapid evolution of AI technologies and their potential applications in healthcare and education.
Read full abstract