Empirical Comparison of Area under ROC curve (AUC) and Mathew Correlation Coefficient (MCC) for Evaluating Machine Learning Algorithms on Imbalanced Datasets for Binary Classification

Chongomweru Halimu,S H Shah Newaz,Asem Kasem

doi:10.1145/3310986.3311023

Abstract

A common challenge encountered when trying to perform classifications and comparing classifiers is selecting a suitable performance metric. This is particularly important when the data has class-imbalance problems. Area under the Receiver Operating Characteristic Curve (AUC) has been commonly used by the machine learning community in such situations, and recently researchers are starting to use Matthew Correlation Coefficient (MCC), especially in biomedical research. However, there is no empirical study that has been conducted to compare the suitability of the two metrics. In this paper, the aim of this study is to provide insights about how AUC and MCC are compared to each other when used with classical machine learning algorithms over a range of imbalanced datasets. In our study, we utilize an earlier-proposed criteria for comparing metrics based on the degree of consistency and degree of Discriminancy to compare AUC against MCC. We carry out experiments using four machine learning algorithms on 54 imbalanced datasets, with imbalance ratios ranging from 1% to 10%. The results demonstrate that both AUC and MCC are statistically consistent with each other; however AUC is more discriminating than MCC. The same observation is noticed when evaluated on 23 balanced datasets. This suggests AUC to be a better measure than MCC in evaluating and comparing classification algorithms.

Full Text