Abstract
PurposeROC is a common evaluation metric for risk scores and classifiers for mortality and adverse events. However, ROC can provide a misleadingly optimistic view of the performance of a classifier when the data are imbalanced, for example when proportion of adverse events is very small. This study illustrates the ambiguity of ROC through a case study of a classifier for post-LVAD Right Heart Failure (RHF), and illustrates the utility of the Precision Recall Curve (PRC) as a supplemental evolution tool.MethodsThis study included 11,967 patients recorded in INTERMACS who received a continuous-flow LVAD between 2006 and 2016 (mean age of 57; 21% female and 79% male) in which the incidence of RHF was only 9% at 1 year (1,079 patients). These data were randomly split into a training set (60%) and test set (40%). A logistic regression was developed using the training data to predict the post-LVAD RHF.ResultsROC in Fig.1.A indicates good performance of the RHF classifier with Area Under Curve (AUC) of 0.83. This is in contrast with the PRC in Fig.1. B with AUC of 0.33 shows the precision of the classifier drops rapidly from 1 (100%) to 0.4 (40%) as recall (sensitivity) increases slightly greater than 0%. The gray dot in Fig.1. A indicates the optimized point of equalized sensitivity and specificity of approximately 76-77%. In contrast, the corresponding precision of the classifier for the same sensitivity (76%) is only 23%. (See gray point in Fig.1.B) This means that only 23% of predicted RHF by this classifier is correct (True RHF). Thus, the preponderance (77%) of patients predicted to experience RHF are incorrectly classified (False RHF). The enormous predicted False RHF was not captured by ROC because False RHF in calculation of specificity is overwhelmed by the huge number of observed patients in the denominator who are free from RHF.ConclusionThe ROC can portray an overly-optimistic performance of a classifier or risk score when applied to imbalanced data. The PRC provide informative insight about the performance of classifier by focusing on the minority class. ROC is a common evaluation metric for risk scores and classifiers for mortality and adverse events. However, ROC can provide a misleadingly optimistic view of the performance of a classifier when the data are imbalanced, for example when proportion of adverse events is very small. This study illustrates the ambiguity of ROC through a case study of a classifier for post-LVAD Right Heart Failure (RHF), and illustrates the utility of the Precision Recall Curve (PRC) as a supplemental evolution tool. This study included 11,967 patients recorded in INTERMACS who received a continuous-flow LVAD between 2006 and 2016 (mean age of 57; 21% female and 79% male) in which the incidence of RHF was only 9% at 1 year (1,079 patients). These data were randomly split into a training set (60%) and test set (40%). A logistic regression was developed using the training data to predict the post-LVAD RHF. ROC in Fig.1.A indicates good performance of the RHF classifier with Area Under Curve (AUC) of 0.83. This is in contrast with the PRC in Fig.1. B with AUC of 0.33 shows the precision of the classifier drops rapidly from 1 (100%) to 0.4 (40%) as recall (sensitivity) increases slightly greater than 0%. The gray dot in Fig.1. A indicates the optimized point of equalized sensitivity and specificity of approximately 76-77%. In contrast, the corresponding precision of the classifier for the same sensitivity (76%) is only 23%. (See gray point in Fig.1.B) This means that only 23% of predicted RHF by this classifier is correct (True RHF). Thus, the preponderance (77%) of patients predicted to experience RHF are incorrectly classified (False RHF). The enormous predicted False RHF was not captured by ROC because False RHF in calculation of specificity is overwhelmed by the huge number of observed patients in the denominator who are free from RHF. The ROC can portray an overly-optimistic performance of a classifier or risk score when applied to imbalanced data. The PRC provide informative insight about the performance of classifier by focusing on the minority class.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.