Artificial intelligence (AI) is a promising tool to aid in diagnostic accuracy and patient communication. Prior literature has shown that ChatGPT answers medical questions and can accurately diagnose surgical conditions. The purpose of this study was to determine the accuracy of ChatGPT 4.0 in evaluating radiologic imaging of common orthopedic upper extremity bony pathologies, including identifying the imaging modality and diagnostic accuracy. Diagnostic imaging was sourced from an open-source radiology database for 6 common upper extremity bony pathologies: distal radius fracture (DRF), metacarpal fracture (MFX), carpometacarpal osteoarthritis (CMC), humerus fracture (HFX), scaphoid fracture (SFX), and scaphoid nonunion (SN). X-ray, computed tomography (CT), and magnetic resonance imaging (MRI) modalities were included. Fifty images were randomly selected from each pathology where possible. Images were uploaded to ChatGPT 4.0 and queried for imaging modality, laterality, and diagnosis. Each image query was completed in a new ChatGPT search tab. Multinomial linear regression was used to identify variations in ChatGPT's diagnostic accuracy across imaging modalities and medical conditions. Overall, ChatGPT provided a diagnosis for 52% of images, with accuracy ranging from 0% to 55%. Diagnostic accuracy was significantly lower for SFX and MFX relative to HFX. ChatGPT was significantly less likely to provide a diagnosis for MRI relative to CT. Diagnostic accuracy ranged from 0% to 40% with regard to imaging modality (x-ray, CT, MRI) though this difference was not statistically significant. ChatGPT's accuracy varied significantly between conditions and imaging modalities, though its iterative learning capabilities suggest potential for future diagnostic utility within hand surgery.
Read full abstract