What do Neural Machine Translation Models Learn about Morphology?

Yonatan Belinkov,Fahim Dalvi,James Glass,Hassan Sajjad,Nadir Durrani

doi:10.18653/v1/p17-1080

Abstract

Neural machine translation (MT) models obtain state-of-the-art performance while maintaining a simple, end-to-end architecture. However, little is known about what these models learn about source and target languages during the training process. In this work, we analyze the representations learned by neural MT models at various levels of granularity and empirically evaluate the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks. We conduct a thorough investigation along several parameters: word-based vs. character-based representations, depth of the encoding layer, the identity of the target language, and encoder vs. decoder representations. Our data-driven, quantitative evaluation sheds light on important aspects in the neural MT system and its ability to capture word structure.

Highlights

Neural network models are quickly becoming the predominant approach to machine translation (MT)
Recent work has started exploring the role of the neural MT (NMT) encoder in learning source syntax (Shi et al, 2016), but research studies are yet to answer important questions such as: (i) what do NMT models learn about word morphology? (ii) what is the effect on learning when translating into/from morphologically-rich languages? (iii) what impact do different representations have on learning? and (iv) what do different modules learn about the syntactic and semantic structure of a language? Answering such questions is imperative for fully understanding the NMT architecture
POS Tags Morph Tags size that our goal is not to beat the state-of-the-art on a given task, but rather to analyze what NMT models learn about morphology

Summary

Introduction

Neural network models are quickly becoming the predominant approach to machine translation (MT). Training neural MT (NMT) models can be done in an end-to-end fashion, which is simpler and more elegant than traditional MT systems. Recent work has started exploring the role of the NMT encoder in learning source syntax (Shi et al, 2016), but research studies are yet to answer important questions such as: (i) what do NMT models learn about word morphology? (ii) what is the effect on learning when translating into/from morphologically-rich languages? (iii) what impact do different representations (character vs word) have on learning? (iv) what do different modules learn about the syntactic and semantic structure of a language? Answering such questions is imperative for fully understanding the NMT architecture. We strive towards exploring (i), (ii), and (iii) by providing quantitative, data-driven answers to the following specific questions:

Objectives

Methods

Findings

Conclusion