Quality control, benchmarking, and pay for performance (P4P) require valid indicators and statistical models allowing adjustment for differences in risk profiles of the patient populations of the respective institutions. Using hospital remuneration data for measuring quality and modelling patient risks has been criticized by clinicians. Here we explore the potential of prediction models for 30- and 90-day mortality after colorectal cancer surgery based on routine data. Full census of a major statutory health insurer. Surgical departments throughout the Federal Republic of Germany. 4283 and 4124 insurants with major surgery for treatment of colorectal cancer during 2013 and 2014, respectively. Age, sex, primary and secondary diagnoses as well as tumor locations as recorded in the hospital remuneration data according to §301 SGB V. 30- and 90-day mortality. Elixhauser comorbidities, Charlson conditions, and Charlson scores were generated from the ICD-10 diagnoses. Multivariable prediction models were developed using a penalized logistic regression approach (logistic ridge regression) in a derivation set (patients treated in 2013). Calibration and discrimination of the models were assessed in an internal validation sample (patients treated in 2014) using calibration curves, Brier scores, receiver operating characteristic curves (ROC curves) and the areas under the ROC curves (AUC). 30- and 90-day mortality rates in the learning-sample were 5.7 and 8.4%, respectively. The corresponding values in the validation sample were 5.9% and once more 8.4%. Models based on Elixhauser comorbidities exhibited the highest discriminatory power with AUC values of 0.804 (95% CI: 0.776 -0.832) and 0.805 (95% CI: 0.782-0.828) for 30- and 90-day mortality. The Brier scores for these models were 0.050 (95% CI: 0.044-0.056) and 0.067 (95% CI: 0.060-0.074) and similar to the models based on Charlson conditions. Regardless of the model, low predicted probabilities were well calibrated, while higher predicted values tended to be overestimates. The reasonable results regarding discrimination and calibration notwithstanding, models based on hospital remuneration data may not be helpful for P4P. Routine data do not offer information regarding a wide range of quality indicators more useful than mortality. As an alternative, models based on clinical registries may allow a wider, more valid perspective.
Read full abstract