Pre-trained Generative Adversarial Networks (GANs) can provide rich information and make various downstream tasks beneficial. However, the training process of GANs potentially learns pre-existing biases in the historical data, which are not only inherited, but also amplified to some extent when using the generated data for prediction. Most available methods for bias mitigation utilize pre-processing or in-processing strategies to retrain GANs, which needs to access the raw data and also increases the time and computational cost required to train a fully expressive GAN. Therefore, we propose a novel post-processing mechanism to achieve fairness outputs of GANs through online prior perturbation. Unlike traditional output-based offline processing, our method mitigates biases from model inputs by designing a prior perturbation network, called “prior perturber”, to form a combined network with the pre-trained GAN. Specifically, we introduce a bias prediction network, called “bias predictor”, for online adversarial training with the prior perturber, which can decouple the generated representations from the bias features and improve the fairness of the downstream prediction results. In addition, the training process is independent of specific target labels, and the generated fair representations have good transferability. Experimental evaluations on five real datasets validate the effectiveness of the proposed method and the best utility is obtained on the same fairness level as compared to other output-based baseline methods. The result on fairness-utility trade-offs significantly improves the fairness of original samples by 56.40%, while the utility is reduced by only 2.60%.
Read full abstract