Towards fair decision: A novel representation method for debiasing pre-trained models

Junheng He,Nankai Lin,Qifeng Bai,Haoyu Liang,Dong Zhou,Aimin Yang

doi:10.1016/j.dss.2024.114208

Abstract

Pretrained language models (PLMs) are frequently employed in Decision Support Systems (DSSs) due to their strong performance. However, recent studies have revealed that these PLMs can exhibit social biases, leading to unfair decisions that harm vulnerable groups. Sensitive information contained in sentences from training data is the primary source of bias. Previously proposed debiasing methods based on contrastive disentanglement have proven highly effective. In these methods, PLMs can disentangle sensitive information from non-sensitive information in sentence embedding, and then adapts non-sensitive information only for downstream tasks. Such approaches hinge on having good sentence embedding as input. However, recent research found that most non-fine-tuned PLMs such as BERT produce poor sentence embedding. Disentangling based on these embedding will lead to unsatisfactory debiasing results. Taking a finer-grained perspective, we propose PCFR (Prompt and Contrastive-based Fair Representation), a novel disentanglement method integrating prompt and contrastive learning to debias PLMs. We employ prompt learning to represent information as sensitive embedding and subsequently apply contrastive learning to contrast these information embedding rather than the sentence embedding. PCFR encourages similarity among different non-sensitive information embedding and dissimilarity between sensitive and non-sensitive information embedding. We mitigate gender and religion biases in two prominent PLMs, namely BERT and GPT-2. To comprehensively assess debiasing efficacy of PCFR, we employ multiple fairness metrics. Experimental results consistently demonstrate the superior performance of PCFR compared to representative baseline methods. Additionally, when applied to specific downstream decision tasks, PCFR not only shows strong de-biasing capability but also significantly preserves task performance.

Full Text