A modified Wald interval for the area under the ROC curve (AUC) in diagnostic case-control studies

Martina Kottas,Antonia Zapf,Oliver Kuss

doi:10.1186/1471-2288-14-26

Abstract

BackgroundThe area under the receiver operating characteristic (ROC) curve, referred to as the AUC, is an appropriate measure for describing the overall accuracy of a diagnostic test or a biomarker in early phase trials without having to choose a threshold. There are many approaches for estimating the confidence interval for the AUC. However, all are relatively complicated to implement. Furthermore, many approaches perform poorly for large AUC values or small sample sizes.MethodsThe AUC is actually a probability. So we propose a modified Wald interval for a single proportion, which can be calculated on a pocket calculator. We performed a simulation study to compare this modified Wald interval (without and with continuity correction) with other intervals regarding coverage probability and statistical power.ResultsThe main result is that the proposed modified Wald intervals maintain and exploit the type I error much better than the intervals of Agresti-Coull, Wilson, and Clopper-Pearson. The interval suggested by Bamber, the Mann-Whitney interval without transformation and also the interval of the binormal AUC are very liberal. For small sample sizes the Wald interval with continuity has a comparable coverage probability as the LT interval and higher power. For large sample sizes the results of the LT interval and of the Wald interval without continuity correction are comparable.ConclusionsIf individual patient data is not available, but only the estimated AUC and the total sample size, the modified Wald intervals can be recommended as confidence intervals for the AUC. For small sample sizes the continuity correction should be used.

Highlights

The area under the receiver operating characteristic (ROC) curve, referred to as the area under the ROC curve (AUC), is an appropriate measure for describing the overall accuracy of a diagnostic test or a biomarker in early phase trials without having to choose a threshold
The ROC curve is a plot of sensitivity and one minus specificity for each possible threshold
Because all confidence intervals for the AUC are relatively complicated to implement, and some of them either do not maintain or do not exploit the type I error probability α for small sample sizes or large values for the AUC, we investigated alternatives

Summary

Introduction

The area under the receiver operating characteristic (ROC) curve, referred to as the AUC, is an appropriate measure for describing the overall accuracy of a diagnostic test or a biomarker in early phase trials without having to choose a threshold. Many approaches perform poorly for large AUC values or small sample sizes. If an appropriate threshold for the quantitative parameter has not yet been defined, the receiver operating characteristic (ROC) curve and in particular the area under this curve, are appropriate for evaluating the overall accuracy of the diagnostic test [1]. In the systematic review by Cochrane and Ebmeier of diffusion tensor imaging (DTI) as a candidate biomarker for the diagnosis of Parkinson disease, the median total sample size of the 21 selected studies was 32 (mean=39) [2]. For example in the systematic review by Wang et al the AUC’s of the different diagnostic tests were between 0.78 and 0.91 [3]

Objectives

Methods

Results