Performance of the streamlined quality outcomes database web-based calculator: internal and external validation

Leah Y Carreon,Hui Nian,Kristin R Archer,Mikkel Ø Andersen,Karen Højmark Hansen,Steven D Glassman

doi:10.1016/j.spinee.2023.11.024

Abstract

BACKGROUND CONTEXTWith an increasing number of web-based calculators designed to provide the probabilities of an individual achieving improvement after lumbar spine surgery, there is a need to determine the accuracy of these models. PURPOSETo perform an internal and external validation study of the reduced Quality Outcomes Database web-based Calculator (QOD-Calc). STUDY DESIGNObservational longitudinal cohort. PATIENT SAMPLEPatients enrolled study-wide in Quality Outcomes Database (QOD) and patients enrolled in DaneSpine at a single institution who had elective lumbar spine surgery with baseline data to complete QOD-Calc and 12-month postoperative data. OUTCOME MEASURESOswestry Disability Index (ODI), Numeric Rating Scales (NRS) for back and leg pain, EuroQOL-5D (EQ-5D). METHODSBaseline data elements were entered into QOD-Calc to determine the probability for each patient having Any Improvement and 30% Improvement in NRS leg pain, back pain, EQ-5D and ODI. These probabilities were compared with the actual 12-month postop data for each of the QOD and DaneSpine cases. Receiver-operating characteristics analyses were performed and calibration plots created to assess model performance. RESULTS24,755 QOD cases and 8,105 DaneSpine lumbar cases were included in the analysis. QOD-Calc had acceptable to outstanding ability (AUC: 0.694–0.874) to predict Any Improvement in the QOD cohort and moderate to acceptable ability (AUC: 0.658–0.747) to predict 30% Improvement. QOD-Calc had acceptable to exceptional ability (AUC: 0.669–0.734) to predict Any improvement and moderate to exceptional ability (AUC: 0.619–0.862) to predict 30% Improvement in the DaneSpine cohort. AUCs for the DaneSpine cohort was consistently lower that the AUCs for the QOD validation cohort. CONCLUSIONQOD-Calc performs well in predicting outcomes in a patient population that is similar to the patients that was used to develop it. Although still acceptable, model performance was slightly worse in a distinct population, despite the fact that the sample was more homogenous. Model performance may also be attributed to the low discrimination threshold, with close to 90% of cases reporting Any Improvement in outcome. Prediction models may need to be developed that are highly specific to the characteristics of the population.

Full Text