Towards Robust Neural Machine Translation

Yong Cheng,Fandong Meng,Junjie Zhai,Zhaopeng Tu,Yang Liu

doi:10.18653/v1/p18-1163

Abstract

Small perturbations in the input can severely distort intermediate representations and thus impact translation quality of neural machine translation (NMT) models. In this paper, we propose to improve the robustness of NMT models with adversarial stability training. The basic idea is to make both the encoder and decoder in NMT models robust against input perturbations by enabling them to behave similarly for the original input and its perturbed counterpart. Experimental results on Chinese-English, English-German and English-French translation tasks show that our approaches can not only achieve significant improvements over strong NMT systems but also improve the robustness of NMT models.

Highlights

Neural machine translation (NMT) models have advanced the state of the art by building a single neural network that can better learn representations (Cho et al, 2014; Sutskever et al, 2014)
Instability makes NMT models sensitive to misspellings and typos in text translation. We address this challenge with adversarial stability training for neural machine translation
Given that our approach can be applied to any NMT systems, we expect that the adversarial stability training mechanism can further improve performance upon the advanced NMT architectures

Summary

Introduction

Neural machine translation (NMT) models have advanced the state of the art by building a single neural network that can better learn representations (Cho et al, 2014; Sutskever et al, 2014). The neural network consists of two components: an encoder network that encodes the input sentence into a sequence of distributed representations, based on which a decoder network generates the translation with an attention model (Bahdanau et al, 2015; Luong et al, 2015). P (y|x; θ) is defined on a holistic neural network which mainly includes two core components: an encoder encodes a source sentence x into a sequence of hidden representations. Hx = H1, ..., HM , and a decoder generates the n-th target word based on the sequence of hidden representations:.

Objectives

Methods

Results

Conclusion