Introduction Artificial intelligence (AI)-powered tools are increasingly integrated into healthcare. The purpose of the present study was to compare fracture management plans generated by clinicians to those obtained from ChatGPT (OpenAI, San Francisco, CA) and Google Gemini (Google, Inc., Mountain View, CA). Methodology A retrospective comparative analysis was conducted. The study included 70 cases of isolated injuries treated at the authors' institution fracture clinic. Complex, open fractures and non-specific diagnoses were excluded. All relevant clinical details were introduced into ChatGPT and Google Gemini. The AI-generated management plans were compared with actual documented plans obtained from the clinical records. The study focused on treatment recommendations and follow-up strategies. Results In terms of agreement with actual treatment plans, Google Gemini matched in only 13 cases (19%), with disagreements in the remainder of cases due to overgeneralisation, inadequate treatment, and ambiguity. In contrast, ChatGPT matched actual plans in 24 cases (34%), with overgeneralisation being the principal cause for disagreement. The differences between AI-powered tools and actual clinician-led plans were statistically significant (p < 0.001). Conclusion Both AI-powered tools demonstrated significant disagreement with actual clinical management plans. While ChatGPT showed closer alignment to human expertise, particularly in treatment recommendations, both AI engines still lacked the clinical precision required for accurate fracture management. These findings highlight the current limitations of ordinary AI-powered tools and negate their ability to replace a clinician-led fracture clinic appointment.
Read full abstract