The area under the Receiver Operating Characteristic (ROC) curve (AUC) is a standard metric for quantifying and comparing binary classifiers. A popular approach to estimating the AUCs and the associated variabilities – the variance of the AUC or the full covariance matrix of multiple correlated AUCs – is the one proposed by DeLong et al. (1988), which is based on the Mann Whitney two-sample U-statistics. The bias of a variance estimator is an important factor in applications such as hypothesis testing and construction of confidence intervals – a negatively biased variance estimator may lead to incorrect conclusions, and a positive bias is conservative hence preferable. In this work, we show that the (co-)variance estimate in DeLong’s approach is always positively biased. More specifically, the difference matrix between the expectation of the estimated covariance and the true covariance is a positive semi-definite matrix. This bias is non-negligible when the sample size is small, and quickly diminishes as the sample size increases. Our method relies on constructing, from the AUC kernel, a random variable whose (co-)variance matrix coincides with the bias, thereby establishing the claim. We also discuss alternative approaches to AUC variance estimation that may potentially reduce the bias.
Read full abstract