AUC Measures Research Articles

Statistical and clinical perspectives on risk models can be very different, even with agreement on the objective: to develop accurate and precise risk estimates for rational, effective, and cost-effective prevention strategies (1). For example, Katki et al. (2) estimated the risk of cervical precancer (cervical intraepithelial neoplasia grade 3) or worse conditional on results of concurrent testing for human papillomavirus (HPV) and cervical cytology (cotesting) in 300,000 women 30 years of age or older who were enrolled in a large health maintenance organization. The nearly equivalent risk over 5 years for all the women who were HPV negative (3.8 per 100,000 per year) and those who were both HPV negative and cytology negative (3.2 per 100,000 per year) might justify a prevention program using primary screening based on results of HPV testing alone. In contrast, among 17,000 HPV-positive women, the 5-year risk of cervical intraepithelial neoplasia grade 3 or worse was substantially lower (5.9%) in the 12,000 women with normal cytology than it was in the 5,000 women who had abnormal cytology (12.1%). This suggests that cytology testing might improve decisions about management of women who test positive for HPV. As exemplified by Katki et al. (2), the value of an additional clinical test is a function of simple risks or generalizations of predictive values, along with proportions with the different clinical presentations. Both a risk model and a clinical test are risk classifiers or risk stratifiers. For evaluation of risk models, as Cook (3) notes in one commentary discussing an article by Pencina et al. (4) in this issue of the Journal, neither area under the receiver operating characteristic curve (AUC), integrated discrimination improvement (IDI), nor net reclassification improvement (NRI) is necessary when direct evaluation of clinical performance of a program based on the risk model is possible. The measure of clinical utility is difference in rates or risk of clinical outcomes from intervention strategies using rules for assignment based on the different models. The differences in risk and the distribution of the possible clinical presentation from the models are measures of risk stratification or of the extra precision resulting from the added complexity of a model. Of course, clinical utility also depends on the efficacy of available interventions; a model with prefect predictions has no clinical utility without an effective intervention. Pencina et al. (4) discussed measures for evaluating risk models by purely statistical assessment without consideration of interventions with known efficacy, which is needed for evaluating clinical utility. They advocated reporting IDI and NRI in addition to the most commonly used measure, AUC. AUC is a nonparametric test statistic (the Mann–Whitney U test equivalent to a Wilcoxon rank sum test) of equality of distribution of estimated risk in cases and controls. The IDI adds to the AUC because an estimate of difference in means adds information not available from a statistic that is based on ranks and not actual values. NRIs can capture effects of differences in distribution, including spread and skewness, that are not reflected in the difference in means. As noted by Pencina et al. (4), 2 of the American Heart Association's 2009 criteria for evaluation of novel markers of cardiovascular risk are “documentation of incremental information when added to standard risk markers” and “assessment of effects on patient management and outcomes” (5, p. 2408). As noted by Kerr et al. (6), assessment of the marginal increase in utility from an improved model requires an objective assessment of effects on patient management and outcomes; the incremental information criterion needs to be a purely statistical measure if it is separate from clinical utility. The difference between the clinical and statistical views is manifested in one particularly important way: the role of variability in the risk. Less variability, or more homogeneity, in risk within cases and controls increases the AUC and other measures of discrimination; in contrast, greater variation in risk in the study population for which decisions are to be made increases the potential for assigning a intervention for those at extreme risks different from the intervention appropriate for one at average risk. In other words, small variance in risk, conditional on disease, increases discrimination, but large unconditional variance increases the potential for clinical utility. Of course, the variation needs to be real, not a consequence of random variation of risk estimates or misclassification of markers in the model. This exchange of views (1, 2, 6, 7) highlights the differences between the clinical and statistical perspectives on risk models. The connection between the 2 perspectives is not yet clear to all.

Read full abstract

Using an automatic data-driven approach, this paper develops a prediction model that achieves more balanced performance (in terms of sensitivity and specificity) than the Canadian Assessment of Tomography for Childhood Head Injury (CATCH) rule, when predicting the need for computed tomography (CT) imaging of children after a minor head injury. CT is widely considered an effective tool for evaluating patients with minor head trauma who have potentially suffered serious intracranial injury. However, its use poses possible harmful effects, particularly for children, due to exposure to radiation. Safety concerns, along with issues of cost and practice variability, have led to calls for the development of effective methods to decide when CT imaging is needed. Clinical decision rules represent such methods and are normally derived from the analysis of large prospectively collected patient data sets. The CATCH rule was created by a group of Canadian pediatric emergency physicians to support the decision of referring children with minor head injury to CT imaging. The goal of the CATCH rule was to maximize the sensitivity of predictions of potential intracranial lesion while keeping specificity at a reasonable level. After extensive analysis of the CATCH data set, characterized by severe class imbalance, and after a thorough evaluation of several data mining methods, we derived an ensemble of multiple Naive Bayes classifiers as the prediction model for CT imaging decisions. In the first phase of the experiment we compared the proposed ensemble model to other ensemble models employing rule-, tree- and instance-based member classifiers. Our prediction model demonstrated the best performance in terms of AUC, G-mean and sensitivity measures. In the second phase, using a bootstrapping experiment similar to that reported by the CATCH investigators, we showed that the proposed ensemble model achieved a more balanced predictive performance than the CATCH rule with an average sensitivity of 82.8% and an average specificity of 74.4% (vs. 98.1% and 50.0% for the CATCH rule respectively). Automatically derived prediction models cannot replace a physician's acumen. However, they help establish reference performance indicators for the purpose of developing clinical decision rules so the trade-off between prediction sensitivity and specificity is better understood.

Read full abstract

AUC Measures Research Articles

Articles published on AUC Measures

Clinical Utility in Evaluation of Risk Models

Predicting the need for CT imaging in children with minor head injury using an ensemble of Naive Bayes classifiers

Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure in species distribution modelling

Using species distribution models to identify suitable areas for biofuel feedstock production

CONTINUOUS MEASURES OF BREATHING AND PHASE ANGLE IN CONSCIOUS CYNOMOLGUS MONKEYS

Effects of nateglinide on the secretion of glycated insulin and glucose tolerance in type 2 diabetes

The trigonometric responder approach: a new method for detecting responders to pharmacological or experimental challenges.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

AUC Measures Research Articles

Articles published on AUC Measures

Clinical Utility in Evaluation of Risk Models

Predicting the need for CT imaging in children with minor head injury using an ensemble of Naive Bayes classifiers

Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure in species distribution modelling

Using species distribution models to identify suitable areas for biofuel feedstock production

CONTINUOUS MEASURES OF BREATHING AND PHASE ANGLE IN CONSCIOUS CYNOMOLGUS MONKEYS

Effects of nateglinide on the secretion of glycated insulin and glucose tolerance in type 2 diabetes

The trigonometric responder approach: a new method for detecting responders to pharmacological or experimental challenges.