Developing a prognostic model for biomedical applications typically requires mapping an individual's set of covariates to a measure of the risk that he or she may experience the event to be predicted. Many scenarios, however, especially those involving adverse pathological outcomes, are better described by explicitly accounting for the timing of these events, as well as their probability. As a result, in these cases, traditional classification or ranking metrics may be inadequate to inform model evaluation or selection. To address this limitation, it is common practice to reframe the problem in the context of survival analysis, and resort, instead, to the concordance index (C-index), which summarises how well a predicted risk score describes an observed sequence of events. A practically meaningful interpretation of the C-index, however, may present several difficulties and pitfalls. Specifically, we identify two main issues: i) the C-index remains implicitly, and subtly, dependent on time, and ii) its relationship with the number of subjects whose risk was incorrectly predicted is not straightforward. Failure to consider these two aspects may introduce undesirable and unwanted biases in the evaluation process, and even result in the selection of a suboptimal model. Hence, here, we discuss ways to obtain a meaningful interpretation in spite of these difficulties. Aiming to assist experimenters regardless of their familiarity with the C-index, we start from an introductory-level presentation of its most popular estimator, highlighting the latter's temporal dependency, and suggesting how it might be correctly used to inform model selection. We also address the nonlinearity of the C-index with respect to the number of correct risk predictions, elaborating a simplified framework that may enable an easier interpretation and quantification of C-index improvements or deteriorations.
Read full abstract