Why Overfitting Isn’t Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries

Mozhi Zhang,Yoshinari Fujinuma,Jordan Boyd-Graber,Michael J Paul

doi:10.18653/v1/2020.acl-main.201

Mozhi Zhang, Yoshinari Fujinuma + Show 2 more

Open Access

https://doi.org/10.18653/v1/2020.acl-main.201

Copy DOI

Abstract

Cross-lingual word embeddings (CLWE) are often evaluated on bilingual lexicon induction (BLI). Recent CLWE methods use linear projections, which underfit the training dictionary, to generalize on BLI. However, underfitting can hinder generalization to other downstream tasks that rely on words from the training dictionary. We address this limitation by retrofitting CLWE to the training dictionary, which pulls training translation pairs closer in the embedding space and overfits the training dictionary. This simple post-processing step often improves accuracy on two downstream tasks, despite lowering BLI test accuracy. We also retrofit to both the training dictionary and a synthetic dictionary induced from CLWE, which sometimes generalizes even better on downstream tasks. Our results confirm the importance of fully exploiting training dictionary in downstream tasks and explains why BLI is a flawed CLWE evaluation.

Highlights

Cross-lingual word embeddings (CLWE) map words across languages to a shared vector space
To balance between fitting the training dictionary and generalizing on other words, we explore retrofitting to both the training dictionary and a synthetic dictionary induced from the CLWE
Results training dictionary retrofitting lowers bilingual lexicon induction (BLI) test accuracy, it improves both downstream tasks’ test accuracy (Figure 3). This confirms that over-optimizing the test BLI accuracy can hurt downstream tasks because training dictionary words are important

Summary

Introduction

Cross-lingual word embeddings (CLWE) map words across languages to a shared vector space. Recent supervised CLWE methods follow a projection-based pipeline (Mikolov et al, 2013). A linear projection maps pre-trained monolingual embeddings to a multilingual space. While CLWE enable many multilingual tasks (Klementiev et al, 2012; Guo et al, 2015; Zhang et al, 2016; Ni et al, 2017), most recent work only evaluates CLWE on bilingual lexicon induction (BLI). A set of test words are translated with a retrieval heuristic (e.g., nearest neighbor search) and compared against gold translations. BLI accuracy is easy to compute and captures the desired property of CLWE that translation pairs should be close.

Methods

Results

Conclusion