Abstract
Neural machine translation (MT) models obtain state-of-the-art performance while maintaining a simple, end-to-end architecture. However, little is known about what these models learn about source and target languages during the training process. In this work, we analyze the representations learned by neural MT models at various levels of granularity and empirically evaluate the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks. We conduct a thorough investigation along several parameters: word-based vs. character-based representations, depth of the encoding layer, the identity of the target language, and encoder vs. decoder representations. Our data-driven, quantitative evaluation sheds light on important aspects in the neural MT system and its ability to capture word structure.
Highlights
Neural network models are quickly becoming the predominant approach to machine translation (MT)
Recent work has started exploring the role of the neural MT (NMT) encoder in learning source syntax (Shi et al, 2016), but research studies are yet to answer important questions such as: (i) what do NMT models learn about word morphology? (ii) what is the effect on learning when translating into/from morphologically-rich languages? (iii) what impact do different representations have on learning? and (iv) what do different modules learn about the syntactic and semantic structure of a language? Answering such questions is imperative for fully understanding the NMT architecture
POS Tags Morph Tags size that our goal is not to beat the state-of-the-art on a given task, but rather to analyze what NMT models learn about morphology
Summary
Neural network models are quickly becoming the predominant approach to machine translation (MT). Training neural MT (NMT) models can be done in an end-to-end fashion, which is simpler and more elegant than traditional MT systems. Recent work has started exploring the role of the NMT encoder in learning source syntax (Shi et al, 2016), but research studies are yet to answer important questions such as: (i) what do NMT models learn about word morphology? (ii) what is the effect on learning when translating into/from morphologically-rich languages? (iii) what impact do different representations (character vs word) have on learning? (iv) what do different modules learn about the syntactic and semantic structure of a language? Answering such questions is imperative for fully understanding the NMT architecture. We strive towards exploring (i), (ii), and (iii) by providing quantitative, data-driven answers to the following specific questions:
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.