The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation.

Davide Chicco,Giuseppe Jurman,Matthijs J Warrens

doi:10.7717/peerj-cs.623

Abstract

Regression analysis makes up a large part of supervised machine learning, and consists of the prediction of a continuous independent target from a set of other predictor variables. The difference between binary classification and regression is in the target range: in binary classification, the target can have only two values (usually encoded as 0 and 1), while in regression the target can have multiple values. Even if regression analysis has been employed in a huge number of machine learning studies, no consensus has been reached on a single, unified, standard metric to assess the results of the regression itself. Many studies employ the mean square error (MSE) and its rooted variant (RMSE), or the mean absolute error (MAE) and its percentage variant (MAPE). Although useful, these rates share a common drawback: since their values can range between zero and +infinity, a single value of them does not say much about the performance of the regression with respect to the distribution of the ground truth elements. In this study, we focus on two rates that actually generate a high score only if the majority of the elements of a ground truth group has been correctly predicted: the coefficient of determination (also known as R-squared or R2) and the symmetric mean absolute percentage error (SMAPE). After showing their mathematical properties, we report a comparison between R2 and SMAPE in several use cases and in two real medical scenarios. Our results demonstrate that the coefficient of determination (R-squared) is more informative and truthful than SMAPE, and does not have the interpretability limitations of MSE, RMSE, MAE and MAPE. We therefore suggest the usage of R-squared as standard metric to evaluate regression analyses in any scientific domain.

Highlights

The role played by regression analysis in data science cannot be overemphasised: predicting a continuous target is a pervasive task in practical terms, and at a conceptual level
We know for example that a negative coefficient of determination and a symmetric mean absolute percentage error (SMAPE) equal to 1.9 clearly correspond to a regression which performed poorly, but we do not have a specific value for mean absolute error (MAE), mean square error (MSE), root of mean square error (RMSE) and mean absolute percentage error (MAPE) that indicates this outcome
As mentioned earlier, each value of MAE, MSE, RMSE and MAPE communicates the quality of the regression only relatively to other regression performances, and not in an absolute manner, like R-squared and SMAPE do

Summary

Introduction

The role played by regression analysis in data science cannot be overemphasised: predicting a continuous target is a pervasive task in practical terms, and at a conceptual level. The reference landscape is far wider: the aforementioned considerations stimulated a steady flow of studies investigating more philosophically oriented arguments (Allen, 2004; Berk, 2004), or deeper analysis of implications related to learning (Bartlett et al, 2020). Given the aforementioned overall considerations, it comes as no surprise that, to what happened for binary classification, a plethora of performance metrics have been defined and are currently in use for evaluating the quality of a regression model (Shcherbakov et al, 2013; Hyndman & Koehler, 2006; Botchkarev, 2018b, Botchkarev, 2018a, Botchkarev, 2019). The parallel with classification goes even further: in the scientific community, a shared consensus on a preferential metric is far from being reached, concurring to making comparison of methods and results a daunting task

Methods

Results

Conclusion