A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms

André M Carrington,Hammad Qazi,Douglas G Manuel,Helen H Chen,Paul W Fieguth,Franz Mayr,Andreas Holzinger

doi:10.1186/s12911-019-1014-6

Abstract

BackgroundIn classification and diagnostic testing, the receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC) describe how an adjustable threshold causes changes in two types of error: false positives and false negatives. Only part of the ROC curve and AUC are informative however when they are used with imbalanced data. Hence, alternatives to the AUC have been proposed, such as the partial AUC and the area under the precision-recall curve. However, these alternatives cannot be as fully interpreted as the AUC, in part because they ignore some information about actual negatives.MethodsWe derive and propose a new concordant partial AUC and a new partial c statistic for ROC data—as foundational measures and methods to help understand and explain parts of the ROC plot and AUC. Our partial measures are continuous and discrete versions of the same measure, are derived from the AUC and c statistic respectively, are validated as equal to each other, and validated as equal in summation to whole measures where expected. Our partial measures are tested for validity on a classic ROC example from Fawcett, a variation thereof, and two real-life benchmark data sets in breast cancer: the Wisconsin and Ljubljana data sets. Interpretation of an example is then provided.ResultsResults show the expected equalities between our new partial measures and the existing whole measures. The example interpretation illustrates the need for our newly derived partial measures.ConclusionsThe concordant partial area under the ROC curve was proposed and unlike previous partial measure alternatives, it maintains the characteristics of the AUC. The first partial c statistic for ROC plots was also proposed as an unbiased interpretation for part of an ROC curve. The expected equalities among and between our newly derived partial measures and their existing full measure counterparts are confirmed. These measures may be used with any data set but this paper focuses on imbalanced data with low prevalence.Future workFuture work with our proposed measures may: demonstrate their value for imbalanced data with high prevalence, compare them to other measures not based on areas; and combine them with other ROC measures and techniques.

Highlights

In classification and diagnostic testing, the receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC) describe how an adjustable threshold causes changes in two types of error: false positives and false negatives
Since the concordance matrix demonstrates an exact correspondence between c and AUC, we expect that a proper partial c statistic in the concordance matrix will correspond to the concordant partial AUC we proposed in the introduction
Class imbalance in data traditionally prompted the use of the alternatives to the AUC including partial measures or area under the precision recall curve (PRC) (AUPRC), but partial area under the ROC curve (pAUC), standardized Partial Area (sPA) and AUPRC are biased toward positives and are each one half of a pair

Summary

Introduction

In classification and diagnostic testing, the receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC) describe how an adjustable threshold causes changes in two types of error: false positives and false negatives. Alternatives to the AUC have been proposed, such as the partial AUC and the area under the precision-recall curve These alternatives cannot be as fully interpreted as the AUC, in part because they ignore some information about actual negatives. The ability of a classifier or diagnostic test to discriminate between actual positives and negatives, is often assessed by its curve in a receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC). When data are imbalanced with few positives relative to negatives (i.e. a low prevalence or incidence of a disease in the total population), we need high specificity to avoid a large number of false positives and ideally high sensitivity as well. Neither strategy fully represents the information in the part of the curve that is of interest

Methods

Discussion

Conclusion