Nurse is Closer to Woman than Surgeon? Mitigating Gender-Biased Proximities in Word Embeddings

Vaibhav Kumar,Tanmoy Chakraborty,Tenzin Singhay Bhotia

doi:10.1162/tacl_a_00327

Abstract

Word embeddings are the standard model for semantic and syntactic representations of words. Unfortunately, these models have been shown to exhibit undesirable word associations resulting from gender, racial, and religious biases. Existing post-processing methods for debiasing word embeddings are unable to mitigate gender bias hidden in the spatial arrangement of word vectors. In this paper, we propose RAN-Debias, a novel gender debiasing methodology that not only eliminates the bias present in a word vector but also alters the spatial distribution of its neighboring vectors, achieving a bias-free setting while maintaining minimal semantic offset. We also propose a new bias evaluation metric, Gender-based Illicit Proximity Estimate (GIPE), which measures the extent of undue proximity in word vectors resulting from the presence of gender-based predilections. Experiments based on a suite of evaluation metrics show that RAN-Debias significantly outperforms the state-of-the-art in reducing proximity bias (GIPE) by at least 42.02%. It also reduces direct bias, adding minimal semantic disturbance, and achieves the best performance in a downstream application task (coreference resolution).

Highlights

Word embedding methods (Devlin et al, 2019; Mikolov et al, 2013a; Pennington et al, 2014) have been staggeringly successful in mapping the semantic space of words to a space of real-valued vectors, capturing both semantic and syntactic relationships
We demonstrate the ability of RAN-GloVe to mitigate gender proximity bias by computing and contrasting the Gender-based Illicit Proximity Estimate (GIPE) value
We demonstrate that RAN-GloVe successfully mitigates gender bias in a downstream application - coreference resolution

Summary

Introduction

Word embedding methods (Devlin et al, 2019; Mikolov et al, 2013a; Pennington et al, 2014) have been staggeringly successful in mapping the semantic space of words to a space of real-valued vectors, capturing both semantic and syntactic relationships. As recent research has shown, word embeddings possess a spectrum of biases related to gender (Bolukbasi et al, 2016; Hoyle et al, 2019), race, and religion (Manzini et al, 2019; Otterbacher et al, 2017). Bolukbasi et al (2016) showed that there is a disparity in the association of professions with gender. A word embedding model trained on data from a popular social media platform generates analogies such as ‘‘Muslim is to terrorist as Christian is to civilian’’ (Manzini et al, 2019). Given the large scale use of word embeddings, it becomes cardinal to remove the manifestation of biases. We focus on mitigating gender bias from pre-trained word embeddings

Objectives

Methods

Results

Conclusion