Abstract

In many two-class problems in automated classification and information retrieval, the classes are imbalanced, and the separation between the positive and negative classes is large. The precision-recall (PR) curve has been suggested as an alternative to the receiver operating characteristic (ROC) curve to characterize the performance of automated systems when the classes are imbalanced, and the area under the precision-recall curve (AUCPR) has been suggested as an alternative performance measure to the area under the ROC curve (AUCROC). AUCPR and AUCROC are distinct measures of performance, even though the relationship between the precision-recall and ROC curves is well-known. In this study, we compared the statistical power of the AUCPR to that of the AUCROC. Our results indicate that the AUCPR can offer a small statistical advantage when the prevalence is low and the separation between the positive and negative classes is large. When the data set is more balanced or the separation between the classes is low or moderate, AUCROC has slightly higher power.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.