Assessing predictive accuracy: How to compare brier scores

Donald A Redelmeier,Daniel A Bloch,David H Hickam

doi:10.1016/0895-4356(91)90146-z

Abstract

Several investigators have used the Brier index to measure the predictive accuracy of a set of medical judgments; the Brier scores of different raters who have evaluated the same patients provides a measure of relative accuracy. However, such comparisons may be difficult to interpret because of the lack of a statistical test for differentiating between two Brier scores. To demonstrate a method for addressing this issue we analyzed the judgments of five medical students, each of whom independently evaluated the same 25 patients with recurrent chest pain. Using the method we determined that two of the students gave judgments that were incompatible with the actual observed outcomes ( p < 0.05); of the three remaining students we detected a significant difference between two ( p < 0.05). These results differed from receiver operating characteristic curve area analysis, another technique used to evaluate predictive accuracy. We suggest that the proposed method can provide a useful tool for investigators using the Brier index to compare how well clinicians express uncertainty using probability judgments.

Full Text