Assessment of Two Thoracolumbar Fracture Classification Systems as Used by Multiple Surgeons

Kirkham B Wood

doi:10.2106/jbjs.c.01530

Abstract

The reproducibility and repeatability of modern systems for classification of thoracolumbar injuries have not been sufficiently studied. We assessed the interobserver and intraobserver reproducibility of the AO (Arbeitsgemeinschaft für Osteosynthesefragen) classification and compared it with that of the Denis classification. Our purpose was to determine whether the newer, AO system had better reproducibility than the older, Denis classification. Anteroposterior and lateral radiographs and computerized tomography scans (axial images and sagittal reconstructions) of thirty-one acute traumatic fractures of the thoracolumbar spine were presented to nineteen observers, all trained spine surgeons, who classified the fractures according to both the AO and the Denis classification systems. Three months later, the images of the thirty-one fractures were scrambled into a different order, and the observers repeated the classification. The Cohen kappa (kappa) test was used to determine interobserver and intraobserver agreement, which was measured with regard to the three basic classifications in the AO system (types A, B, and C) as well as the nine subtypes of that system. We also measured the agreement with regard to the four basic types in the Denis classification (compression, burst, seat-belt, and fracture-dislocation) and with regard to the sixteen subtypes of that system. The AO classification was fairly reproducible, with an average kappa of 0.475 (range, 0.389 to 0.598) for the agreement regarding the assignment of the three types and an average kappa of 0.537 for the agreement regarding the nine subtypes. The average kappa for the agreement regarding the assignment of the four Denis fracture types was 0.606 (range, 0.395 to 0.702), and it was 0.173 for agreement regarding the sixteen subtypes. The intraobserver agreement (repeatability) was 82% and 79% for the AO and Denis types, respectively, and 67% and 56%, for the AO and Denis subtypes, respectively. Both the Denis and the AO system for the classification of spine fractures had only moderate reliability and repeatability. The tendency for well-trained spine surgeons to classify the same fracture differently on repeat testing is a matter of some concern.

Full Text