Why Machine Learning May Lead to Unfairness

Songül Tolan,Marius Miron,Carlos Castillo,Emilia Gómez

doi:10.1145/3322640.3326705

Abstract

In this paper we study the limitations of Machine Learning (ML) algorithms for predicting juvenile recidivism. Particularly, we are interested in analyzing the trade-off between predictive performance and fairness. To that extent, we evaluate fairness of ML models in conjunction with SAVRY, a structured professional risk assessment framework, on a novel dataset originated in Catalonia. In terms of accuracy on the prediction of recidivism, the ML models slightly outperform SAVRY; the results improve with more data or more features available for training (AUCROC of 0.64 with SAVRY vs. AUCROC of 0.71 with ML models). However, across three fairness metrics used in other studies, we find that SAVRY is in general fair, while the ML models tend to discriminate against male defendants, foreigners, or people of specific national groups. For instance, foreigners who did not recidivate are almost twice as likely to be wrongly classified as high risk by ML models than Spanish nationals. Finally, we discuss potential sources of this unfairness and provide explanations for them, by combining ML interpretability techniques with a thorough data analysis. Our findings provide an explanation for why ML techniques lead to unfairness in data-driven risk assessment, even when protected attributes are not used in training.

Full Text