Abstract

We read with great interest the article by Nashef et al .[ 1] regarding the construction and validation of the EuroSCORE II risk model. This manuscript is an outstanding contribution in the creation of a risk stratification model that is robust enough for use in cardiac surgery worldwide. However, we would like to know which statistical procedure should be used to determine calibration of the EuroSCORE II? The Hosmer–Lemeshow goodness-of-fit test has been the most popular test to validate calibration [2, 3], measuring the differences between expected and observed outcomes (mortality) over deciles (test results are acceptable with cohort divided in at least terciles) of risk. A well-calibrated model gives a corresponding P value >0.05. Recently, it has been claimed that a nonsignificant Hosmer–Lemeshow test meant that there was no evidence of bad calibration, but that this result did not mean that there was good calibration [4]. In our opinion, statistical results are either significant or nonsignificant, black or white; there are no grey results in statistics. However, if that is true, how has it come to pass that we needed more than 10 years (during which Hosmer–Lemeshow statistics was used in more than 95% of manuscripts to test the calibration of additive and logistic EuroSCORE in cardiac surgery) to find out that Hosmer– Lemeshow test is no longer valid to determine calibration, and that it should be replaced with the risk-adjusted mortality ratio, [RAMR = observed/predicted (expected) – O/E mortality], as it is now suggested by Nashef et al [1]. An O/E ratio of 1.0 means that the score predicts mortality perfectly. An O/E ratio > 1.0 means that the model underpredicts mortality [in EuroSCORE II, for a validation data set of 5553 patients, the O/E ratio was 1.058 (4.18%/3.95%)], while an O/E ratio < 1.0 means that the model overpredicts mortality. However, how are we going to check the statistical significance of the RAMR value? Conceptually, if the observed number of deaths is equal to the expected number of deaths (as predicted by the scoring system), the RAMR would have a value of 1.0. Thus, the statistical test for the significance of the RAMR is whether it is different from 1.0. To gauge the statistical significance of the RAMR, we must first calculate the 95% confidence interval for the RAMR. If the 95% confidence interval excludes the value ‘1.0’, it may be considered statistically significant (no matter whether it overpredicts or underpredicts mortality). On the contrary, Bhatti et al .[ 5] suggested χ 2 statistics

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call