Dictionary-based Debiasing of Pre-trained Word Embeddings

Masahiro Kaneko,Danushka Bollegala

doi:10.18653/v1/2021.eacl-main.16

Abstract

Word embeddings trained on large corpora have shown to encode high levels of unfair discriminatory gender, racial, religious and ethnic biases. In contrast, human-written dictionaries describe the meanings of words in a concise, objective and an unbiased manner. We propose a method for debiasing pre-trained word embeddings using dictionaries, without requiring access to the original training resources or any knowledge regarding the word embedding algorithms used. Unlike prior work, our proposed method does not require the types of biases to be pre-defined in the form of word lists, and learns the constraints that must be satisfied by unbiased word embeddings automatically from dictionary definitions of the words. Specifically, we learn an encoder to generate a debiased version of an input word embedding such that it (a) retains the semantics of the pre-trained word embedding, (b) agrees with the unbiased definition of the word according to the dictionary, and (c) remains orthogonal to the vector space spanned by any biased basis vectors in the pre-trained word embedding space. Experimental results on standard benchmark datasets show that the proposed method can accurately remove unfair biases encoded in pre-trained word embeddings, while preserving useful semantics.

Highlights

Despite pre-trained word embeddings are useful due to their low dimensionality, memory and compute efficiency, they have shown to encode the semantics of words and unfair discriminatory biases such as gender, racial or religious biases (Bolukbasi et al, 2016; Zhao et al, 2018a; Rudinger et al, 2018; Zhao et al, 2018b; Elazar and Goldberg, 2018; Kaneko and Bollegala, 2019)
We evaluate the proposed method using four standard benchmark datasets for evaluating the biases in word embeddings: Word Embedding Association Test (WEAT; Caliskan et al, 2017), Word Association Test (WAT; Du et al, 2019), SemBias (Zhao et al, 2018b) and WinoBias (Zhao et al, 2018a)
We focus on static word embeddings in this paper, unfair biases have been found in contextualised word embeddings as well (Zhao et al, 2019; Vig, 2019; Bordia and Bowman, 2019; May et al, 2019)

Summary

Introduction

Despite pre-trained word embeddings are useful due to their low dimensionality, memory and compute efficiency, they have shown to encode the semantics of words and unfair discriminatory biases such as gender, racial or religious biases (Bolukbasi et al, 2016; Zhao et al, 2018a; Rudinger et al, 2018; Zhao et al, 2018b; Elazar and Goldberg, 2018; Kaneko and Bollegala, 2019). Methods that learn word embeddings by purely using dictionaries have been proposed (Tissier et al, 2017), they have coverage and data sparseness related issues because precompiled dictionaries do not capture the meanings of neologisms or provide numerous contexts as in a corpus. Prior work has shown that word embeddings learnt from large text corpora to outperform those created from dictionaries in downstream NLP tasks (Alsuhaibani et al, 2019; Bollegala et al, 2016). We must overcome several challenges when using dictionaries to debias pre-trained word embeddings. Not all words in the embeddings will appear in the given dictionary. A lexicalised debiasing method would generalise poorly to the words not in the dictionary. It is not known apriori what biases are hidden inside a set of pretrained word embedding vectors. Depending on the source of documents used for training the embeddings, different types of biases will be learnt and amplified by different word embedding learning algorithms to different degrees (Zhao et al, 2017)

Objectives

Methods

Results