Background: Available data on radiologists' missed cervical spine fractures are based primarily on studies using human reviewers to identify errors on re-evaluation; such studies do not capture the full extent of missed fractures. Objective: To use machine-learning (ML) models to identify cervical spine fractures on CT missed by interpreting radiologists, characterize the nature of these fractures, and assess their clinical significance. Methods: This retrospective study included all cervical spine CT examinations performed in adult patients in the emergency department between January 1, 2018 and December 31, 2022. Examinations reported as negative for cervical spine fracture were processed by seven award-winning ML models from the 2022 RSNA Cervical Spine Fracture AI Challenge; examinations classified as positive by at least four of seven models were considered to have ML-detected fractures. Two neuroradiologists independently reviewed examinations with ML-detected fractures, using ML-derived heat maps, to identify those representing true missed fractures. The neuroradiologists further assessed fractures' extent. Two spine surgeons independently assessed whether missed fractures were clinically significant (i.e., warranting at least one of surgical consultation, MRI, CTA, or collar immobilization). Results: The study included 6671 patients (2414 female, 4257 male; mean age, 54.6±22.1 years) who underwent a total of 6979 cervical spine CT examinations. Interpreting radiologists reported 6378 examinations as negative for fracture. Of these, 356 had ML-detected fractures (i.e., positive by ≥4 of 7 models). The neuroradiologists classified 40 of these examinations, in 39 unique patients, as having true fractures. ML-detected missed true fractures involved 51 unique sites, most commonly the C7 transverse process (n=12), C5 spinous process (n=12), and C6 spinous process (n=8). The surgeons considered missed fractures clinically significant in 15/40 examinations [MRI and collar immobilization (n=7), MRI and surgical evaluation (n=1), CTA (n=9)]. Interobserver agreement, expressed as kappa, was 0.88 between neuroradiologists for true fracture classification and 0.94 between surgeons for clinical significance classification. Conclusion: ML models identified cervical spine fractures missed by radiologists. These fractures were further characterized to systematically highlight radiologists' common misses. Clinical Impact: This ML-based framework can be applied in quality improvement efforts, to help refine radiologists' search patterns based on prone-to-miss findings.
Read full abstract