Mitigating bias in deep learning: training unbiased models on biased data for the morphological classification of galaxies

Esteban Medina-Rosales,Esteban Medina-Rosales,Guillermo Cabrera-Vives,Christopher J Miller,Guillermo Cabrera-Vives,Guillermo Cabrera-Vives,Guillermo Cabrera-Vives

doi:10.1093/mnras/stae1088

Abstract

ABSTRACT Galaxy morphologies and their relation with physical properties have been a relevant subject of study in the past. Most galaxy morphology catalogues have been labelled by human annotators or by machine learning models trained on human-labelled data. Human-generated labels have been shown to contain biases in terms of the observational properties of the data, such as image resolution. These biases are independent of the annotators, that is, are present even in catalogues labelled by experts. In this work, we demonstrate that training deep learning models on biased galaxy data produces biased models, meaning that the biases in the training data are transferred to the predictions of the new models. We also propose a method to train deep learning models that considers this inherent labelling bias, to obtain a de-biased model even when training on biased data. We show that models trained using our deep de-biasing method are capable of reducing the bias of human-labelled data sets.

Full Text