Femoral head fractures are rare but potentially disabling injuries, and classifying them accurately and consistently can help surgeons make good choices about their treatment. However, there is no consensus as to which classification of these fractures is the most advantageous; parameters that might inform this choice include universality (the proportion of fractures that can be classified), as well as, of course, interobserver and intraobserver reproducibility. (1) Which classification achieves the best universality (defined as the proportion of fractures that can be classified)? (2) Which classification delivers the highest intraobserver and interobserver reproducibility in the clinical CT assessment of femoral head fractures? (3) Based on the answers to those two questions, which classifications are the most applicable for clinical practice and research? Between January 2011 and January 2023, 254 patients with femoral head fractures who had CT scans (CT is routine at our institution for patients who have experienced severe hip trauma) were potentially eligible for inclusion in this study, which was performed at a large Level I trauma center in China. Of those, 9% (23 patients) were excluded because of poor-quality CT images, unclosed physes, pathologic fractures, or acetabular dysplasia, leaving 91% (231 patients with 231 hips) for analysis here. Among those, 19% (45) were female. At the time of injury, the mean age was 40 ± 17 years. All fractures were independently classified by four observers according to the Pipkin, Brumback, AO/Orthopaedic Trauma Association (OTA), Chiron, and New classifications. Each observer repeated his classifications again 1 month later to allow us to ascertain intraobserver reliability. To evaluate the universality of classifications, we characterized the percentage of hips that could be classified using the definitions offered in each classification. The kappa (κ) value was calculated to determine interrater and intrarater agreement. We then compared the classifications based on the combination of universality and interobserver and intraobserver reproducibility to determine which classifications might be recommended for clinical and research use. The universalities of the classifications were 99% (228 of 231, Pipkin), 43% (99 of 231, Brumback), 94% (216 of 231, AO/OTA), 99% (228 of 231, Chiron), and 100% (231 of 231, New). The interrater agreement was judged as almost perfect (κ 0.81 [95% CI 0.78 to 0.84], Pipkin), moderate (κ 0.51 [95% CI 0.44 to 0.59], Brumback), fair (κ 0.28 [95% CI 0.18 to 0.38], AO/OTA), substantial (κ 0.79 [95% CI 0.76 to 0.82], Chiron), and substantial (κ 0.63 [95% CI 0.58 to 0.68], New). In addition, the intrarater agreement was judged as almost perfect (κ 0.89 [95% CI 0.83 to 0.96]), substantial (κ 0.72 [95% CI 0.69 to 0.75]), moderate (κ 0.51 [95% CI 0.43 to 0.58]), almost perfect (κ 0.87 [95% CI 0.82 to 0.91]), and substantial (κ 0.78 [95% CI 0.59 to 0.97]), respectively. Based on these findings, we determined that the Pipkin and Chiron classifications offer near-complete universality and sufficient interobserver and intraobserver reproducibility to recommend them for clinical and research use, but the other classifications (Brumback, AO/OTA, and New) do not. Based on our findings, clinicians and clinician-scientists can use either the Pipkin or Chiron classification systems to classify femoral head fractures based on CT images, with equal confidence. It seems unlikely that any new classifications will substantially outperform these, and the other available systems either lacked sufficient universality or reproducibility to recommend them for general use. Level III, diagnostic study.
Read full abstract