Abstract

Demographic biases and social stereotypes are common in pretrained language models (PLMs), while the fine-tuning in downstream applications can also produce new biases or amplify the impact of the original biases. Existing works separate the debiasing from the fine-tuning procedure, which results in a gap between intrinsic bias and application bias. In this work, we propose a debiasing framework CauDebias to eliminate both biases, which directly combines debiasing with fine-tuning and can be applied for any PLMs in downstream tasks. We distinguish the bias-relevant (non-causal factors) and label-relevant (causal factors) parts in sentences from a causal invariant perspective. Specifically, we perform intervention on non-causal factors in different demographic groups, and then devise an invariant risk minimization loss to trade-off performance between bias mitigation and task accuracy. Experimental results on three downstream tasks show that our CauDebias can remarkably reduce biases in PLMs while minimizing the impact on downstream tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.