<h3>BACKGROUND CONTEXT</h3> Previous AO Spine classifications have demonstrated substantial accuracy, interobserver reliability and intraobserver reproducibility regardless of geographic region (as defined by the 6 AO regions of the world – North America, Latin/South America, Africa, Asia and Oceania, Middle East and Europe). However, upper cervical spine injuries are often quickly transferred to academic/tertiary care centers, and as such, may have much greater variability in regional classification ability. The AO Spine Upper Cervical Injury Classification is based on injury level (I. occipital condyle and craniocervical junction, II. C1 and C1-C2 joint, and III. C2 and C2-3 joint) and type of injury (A-bony injury, B-tension band injury, C-fracture dislocation). <h3>PURPOSE</h3> To determine the global variability of the intraobserver reproducibility, interobserver reliability and % agreement compared to the gold-standard (as determined by members of the AO Spine Knowledge Forum Trauma) for the AO Spine Upper Cervical Injury Classification. <h3>STUDY DESIGN/SETTING</h3> Global cross-sectional survey. <h3>PATIENT SAMPLE</h3> A total of 275 AO Spine members. <h3>OUTCOME MEASURES</h3> Precentage agreement with gold-standard, intraobserver reproducibility and interobserver reliability. <h3>METHODS</h3> A total of 275 AO Spine members responded to an open invitation to participate in the AO Spine Upper Cervical Injury Classification validation. A live, online assessment with all participants presenting simultaneously was performed twice at a three-week interval. A total of 25 consecutive upper cervical spine computed tomography (CT) videos, which included axial, sagittal and coronal videos, were played at a rate of 2 frames/second. A REDCap survey was used to capture all participants' classification grades. Pearson's chi square test was used to compare geographic region with significance set at P<0.05. Fleiss' kappa (ƙ) was used to identify the interobserver reliability and intraobserver reproducibility (ƙ=0-0.20 was categorized as slight, 0.21-0.4 was fair, 0.41-0.60 was moderate, 0.61-0.8 was substantial and 0.81-1.0 was excellent). <h3>RESULTS</h3> The majority of participants were from Europe (41.2%) and Asia (23.5%) with the remaining from Central/South America (13.6%), the Middle East (7%), North America (9.5%), and Africa (5.3%). Only 43.6% worked in an academic center and 69.8% worked at a level I trauma center. Participants from Europe, the Middle East, Asia and North America had the highest percentage agreement compared to the gold-standard (range: 75.1-82.9% on both assessments), while Africa and Central/South America (range: 68.9-71.2% on both assessments) had a significantly lower agreement percentage (p<0.001). The interobserver reliability was substantial (ƙ=0.61-0.682) for Asia, Europe, the Middle East, and North America, but it was only moderate for Africa and Central/South America (ƙ=0.487, 0.466, respectively). The intraobserver reproducibility was substantial for North America, Europe, and Asia (ƙ = 0.70, 0.71, 0.61), excellent for the Middle East (ƙ=0.81), and moderate for Africa and Central/South America (0.57 and 0.55, respectively). <h3>CONCLUSIONS</h3> Significant variation in the percentage agreement compared to the gold-standard was identified based on geographic regions. Participants in Africa and South/Latin America regions were less accurate compared to Asia, Europe, South/Latin America and the Middle East. The interobserver reliability and intraobserver reproducibility also showed significant variation between world regions. This may be partly due to less CT scan availability in these regions and the infrequency of these injuries presenting to non-trauma or academic centers, which may make these types of injuries more difficult to accurately identify for some participants. <h3>FDA DEVICE/DRUG STATUS</h3> This abstract does not discuss or include any applicable devices or drugs.
Read full abstract