Saliency Guided Debiasing: Detecting and mitigating biases in LMs using feature attribution

Ratnesh Kumar Joshi,Arindam Chatterjee,Asif Ekbal

doi:10.1016/j.neucom.2023.126851

Abstract

The bias in machine learning models has gained increasing attention in recent years, as these models can reflect and even amplify biases present in the data used to train them. One approach to mitigating bias is identifying and down-weight features that contribute disproportionately to model predictions, which can be accomplished using saliency techniques. Current debiasing methods often lead to the loss of contextual information, where the model tends to respond incorrectly even when the gender information is present in the context; hence, even though the bias reduces, performance (coreference resolution, fluency) also reduces. This paper explores data augmentation and saliency techniques to mitigate bias in natural language generation. Specifically, we explore applying the saliency technique of SHAP (SHapley Additive exPlanations) over a model trained on debiasing using data augmentation (switching gendered words with counterparts) and then applying hard debiasing to remove the influential biased token. We build a dialogue context test setup to evaluate bias and context relevance using the presence of gendered words in the model-generated responses. The response is evaluated based on the gender information from context to ensure the model follows the gender in context. We demonstrate that this approach can effectively reduce the impact of biased features on model predictions while preserving overall model accuracy. Additionally, we discuss potential limitations and future directions for research in this area. Our findings suggest that saliency offers an avenue to address machine learning bias.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Saliency Guided Debiasing: Detecting and mitigating biases in LMs using feature attribution

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

Explainable Machine Learning Model for Predicting GI Bleed Mortality in the Intensive Care Unit.
Farah Deshmukh ... Shamel S Merchant
American Journal of Gastroenterology | VOL. 115
Farah Deshmukh, et. al.Farah Deshmukh ... Shamel S Merchant
27 Apr 2020
American Journal of Gastroenterology | VOL. 115

Understanding Update of Machine-Learning-Based Malware Detection by Clustering Changes in Feature Attributions
Yun Fan ... Masayuki Murata
-
Yun Fan, et. al.Yun Fan ... Masayuki Murata
01 Jan 2020
01 Jan 2020

Forecasting discharges through explainable machine learning approaches at an alpine karst spring
Anna Pölz ... Julia Derx
-
Anna Pölz, et. al.Anna Pölz ... Julia Derx
15 May 2023
15 May 2023

Prediction of shear behavior of glass FRP bars-reinforced ultra-highperformance concrete I-shaped beams using machine learning
Asif Ahmed ... Timon Rabczuk
International Journal of Mechanics and Materials in Design | VOL. 20
Asif Ahmed, et. al.Asif Ahmed ... Timon Rabczuk
30 Aug 2023
International Journal of Mechanics and Materials in Design | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Saliency Guided Debiasing: Detecting and mitigating biases in LMs using feature attribution

Abstract

Talk to us

Similar Papers

More From: Neurocomputing