Abstract

Multilingual Neural Machine Translation (NMT) models have yielded large empirical success in transfer learning settings. However, these black-box representations are poorly understood, and their mode of transfer remains elusive. In this work, we attempt to understand massively multilingual NMT representations (with 103 languages) using Singular Value Canonical Correlation Analysis (SVCCA), a representation similarity framework that allows us to compare representations across different languages, layers and models. Our analysis validates several empirical results and long-standing intuitions, and unveils new observations regarding how representations evolve in a multilingual translation model. We draw three major results from our analysis, with implications on cross-lingual transfer learning: (i) Encoder representations of different languages cluster based on linguistic similarity, (ii) Representations of a source language learned by the encoder are dependent on the target language, and vice-versa, and (iii) Representations of high resource and/or linguistically similar languages are more robust when fine-tuning on an arbitrary language pair, which is critical to determining how much cross-lingual transfer can be expected in a zero or few-shot setting. We further connect our findings with existing empirical observations in multilingual NMT and transfer learning.

Highlights

  • Multilingual Neural Machine Translation (NMT) models have demonstrated great improvements for cross-lingual transfer, on tasks including lowresource language translation (Zoph et al, 2016; Nguyen and Chiang, 2017; Neubig and Hu, 2018) and zero or few-shot transfer learning for downstream tasks (Eriguchi et al, 2018; Lample and Conneau, 2019; Wu and Dredze, 2019)

  • Singular Value Canonical Correlation Analysis (SVCCA) is one such method that allows us to analyze the similarity between noisy, highdimensional representations of the same datapoints learnt across different models, layers and tasks (Raghu et al, 2017)

  • We structure the study into these sections: In Section 2, we describe the experimental setup and tools used to train and analyze our multilingual NMT model

Read more

Summary

Introduction

Multilingual Neural Machine Translation (NMT) models have demonstrated great improvements for cross-lingual transfer, on tasks including lowresource language translation (Zoph et al, 2016; Nguyen and Chiang, 2017; Neubig and Hu, 2018) and zero or few-shot transfer learning for downstream tasks (Eriguchi et al, 2018; Lample and Conneau, 2019; Wu and Dredze, 2019). A possible explanation is the ability of multilingual models to encode text from different languages in a shared representation space, resulting in similar sentences being aligned together (Firat et al, 2016; Johnson et al, 2017; Aharoni et al, 2019; Arivazhagan et al, 2019b) This is justified by the success of multilingual representations on tasks like sentence alignment across languages (Artetxe and Schwenk, 2018), zero-shot cross-lingual classification (Eriguchi et al, 2018) and XNLI (Lample and Conneau, 2019). Our work is the first that attempts to understand the nature of multilingual representations and cross-lingual transfer in deep neural networks, based on analyzing a model trained on 103 languages simultaneously.

Data and Model
Heavy imbalance between language pairs
Diversity
Multilingual NMT Learns Language Similarity
What is Language Similarity?
Representations cluster by language similarity
Representational Similarity evolves across Layers
Analyzing representation robustness to fine-tuning
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.