Abstract

BackgroundWhen outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model.MethodsAn analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition.ResultsUnder the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition.ConclusionsThe discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.

Highlights

  • When outcomes are binary, the c-statistic is a standard measure of the predictive accuracy of a logistic regression model

  • Case study We examined the ability of our derived formulas to predict the c-statistic for two logistic regression models in a sample of subjects hospitalized with acute myocardial infarction (AMI)

  • The improved accuracy of prediction of the c-statistic for the multivariable model is likely due to the distribution of the linear predictor having a distribution that is closer to a normal distribution compared to the distribution of age

Read more

Summary

Introduction

The c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. Calibration refers to the agreement between observed outcomes and predictions, while discrimination refers to the ability of model predictions to discriminate between those with and those without the outcome [1,2]. The discriminativeability of a logistic regression model is frequently assessed using the concordance (or c) statistic, a unitless index denoting the probability that a randomly selected subject who experienced the outcome will have a higher predicted probability of having the outcome occur compared to a randomly selected subject who did not experience the event. The c-statistic is the proportion of such pairs in which the subject who experienced the event had a higher predicted probability of experiencing the event than the subject who did not experience the event [3]. It is related to Somer’s Dxy rank correlation between the predicted probability of the occurrence of the outcome and the observed outcome: Dxy 1⁄4 2ðc À 0:5Þ [3]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call