Investigating Multilingual NMT Representations at Scale

Sneha Kudugunta,Isaac Caswell,Orhan Firat,Ankur Bapna

doi:10.18653/v1/d19-1167

Abstract

Multilingual Neural Machine Translation (NMT) models have yielded large empirical success in transfer learning settings. However, these black-box representations are poorly understood, and their mode of transfer remains elusive. In this work, we attempt to understand massively multilingual NMT representations (with 103 languages) using Singular Value Canonical Correlation Analysis (SVCCA), a representation similarity framework that allows us to compare representations across different languages, layers and models. Our analysis validates several empirical results and long-standing intuitions, and unveils new observations regarding how representations evolve in a multilingual translation model. We draw three major results from our analysis, with implications on cross-lingual transfer learning: (i) Encoder representations of different languages cluster based on linguistic similarity, (ii) Representations of a source language learned by the encoder are dependent on the target language, and vice-versa, and (iii) Representations of high resource and/or linguistically similar languages are more robust when fine-tuning on an arbitrary language pair, which is critical to determining how much cross-lingual transfer can be expected in a zero or few-shot setting. We further connect our findings with existing empirical observations in multilingual NMT and transfer learning.

Highlights

Multilingual Neural Machine Translation (NMT) models have demonstrated great improvements for cross-lingual transfer, on tasks including lowresource language translation (Zoph et al, 2016; Nguyen and Chiang, 2017; Neubig and Hu, 2018) and zero or few-shot transfer learning for downstream tasks (Eriguchi et al, 2018; Lample and Conneau, 2019; Wu and Dredze, 2019)
Singular Value Canonical Correlation Analysis (SVCCA) is one such method that allows us to analyze the similarity between noisy, highdimensional representations of the same datapoints learnt across different models, layers and tasks (Raghu et al, 2017)
We structure the study into these sections: In Section 2, we describe the experimental setup and tools used to train and analyze our multilingual NMT model

Summary

Introduction

Multilingual Neural Machine Translation (NMT) models have demonstrated great improvements for cross-lingual transfer, on tasks including lowresource language translation (Zoph et al, 2016; Nguyen and Chiang, 2017; Neubig and Hu, 2018) and zero or few-shot transfer learning for downstream tasks (Eriguchi et al, 2018; Lample and Conneau, 2019; Wu and Dredze, 2019). A possible explanation is the ability of multilingual models to encode text from different languages in a shared representation space, resulting in similar sentences being aligned together (Firat et al, 2016; Johnson et al, 2017; Aharoni et al, 2019; Arivazhagan et al, 2019b) This is justified by the success of multilingual representations on tasks like sentence alignment across languages (Artetxe and Schwenk, 2018), zero-shot cross-lingual classification (Eriguchi et al, 2018) and XNLI (Lample and Conneau, 2019). Our work is the first that attempts to understand the nature of multilingual representations and cross-lingual transfer in deep neural networks, based on analyzing a model trained on 103 languages simultaneously.

Data and Model

Heavy imbalance between language pairs

Diversity

Multilingual NMT Learns Language Similarity

What is Language Similarity?

Representations cluster by language similarity

Representational Similarity evolves across Layers

Analyzing representation robustness to fine-tuning

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Investigating Multilingual NMT Representations at Scale

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 83	License type: cc-by

Similar Papers

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

Language relatedness evaluation for multilingual neural machine translation
Chenggang Mi ... Shaoliang Xie
Neurocomputing | VOL. 570
Chenggang Mi, et. al.Chenggang Mi ... Shaoliang Xie
12 Dec 2023
Neurocomputing | VOL. 570

Improving Many-to-Many Neural Machine Translation via Selective and Aligned Online Data Augmentation
Weitai Zhang ... Shijin Wang
Applied Sciences | VOL. 13
Weitai Zhang, et. al.Weitai Zhang ... Shijin Wang
20 Mar 2023
Applied Sciences | VOL. 13

Rectifying Ill-Formed Interlingual Space: A Framework for Zero-Shot Translation on Modularized Multilingual NMT
Junwei Liao ... Yu Shi
Mathematics | VOL. 10
Junwei Liao, et. al.Junwei Liao ... Yu Shi
09 Nov 2022
Mathematics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Investigating Multilingual NMT Representations at Scale

Abstract

Highlights

Summary

Talk to us

Similar Papers