Abstract

Neural machine translation (MT) models obtain state-of-the-art performance while maintaining a simple, end-to-end architecture. However, little is known about what these models learn about source and target languages during the training process. In this work, we analyze the representations learned by neural MT models at various levels of granularity and empirically evaluate the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks. We conduct a thorough investigation along several parameters: word-based vs. character-based representations, depth of the encoding layer, the identity of the target language, and encoder vs. decoder representations. Our data-driven, quantitative evaluation sheds light on important aspects in the neural MT system and its ability to capture word structure.

Highlights

  • Neural network models are quickly becoming the predominant approach to machine translation (MT)

  • Recent work has started exploring the role of the neural MT (NMT) encoder in learning source syntax (Shi et al, 2016), but research studies are yet to answer important questions such as: (i) what do NMT models learn about word morphology? (ii) what is the effect on learning when translating into/from morphologically-rich languages? (iii) what impact do different representations have on learning? and (iv) what do different modules learn about the syntactic and semantic structure of a language? Answering such questions is imperative for fully understanding the NMT architecture

  • POS Tags Morph Tags size that our goal is not to beat the state-of-the-art on a given task, but rather to analyze what NMT models learn about morphology

Read more

Summary

Introduction

Neural network models are quickly becoming the predominant approach to machine translation (MT). Training neural MT (NMT) models can be done in an end-to-end fashion, which is simpler and more elegant than traditional MT systems. Recent work has started exploring the role of the NMT encoder in learning source syntax (Shi et al, 2016), but research studies are yet to answer important questions such as: (i) what do NMT models learn about word morphology? (ii) what is the effect on learning when translating into/from morphologically-rich languages? (iii) what impact do different representations (character vs word) have on learning? (iv) what do different modules learn about the syntactic and semantic structure of a language? Answering such questions is imperative for fully understanding the NMT architecture. We strive towards exploring (i), (ii), and (iii) by providing quantitative, data-driven answers to the following specific questions:

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call