Abstract

This work explores the application of recurrent neural network (RNN) language and translation models during phrasebased decoding. Due to their use of unbounded context, the decoder integration of RNNs is more challenging compared to the integration of feedforward neural models. In this paper, we apply approximations and use caching to enable RNN decoder integration, while requiring reasonable memory and time resources. We analyze the effect of caching on translation quality and speed, and use it to integrate RNN language and translation models into a phrase-based decoder. To the best of our knowledge, no previous work has discussed the integration of RNN translation models into phrase-based decoding. We also show that a special RNN can be integrated efficiently without the need for approximations. We compare decoding using RNNs to rescoring n-best lists on two tasks: IWSLT 2013 German→English, and BOLT Arabic→English. We demonstrate that the performance of decoding with RNNs is at least as good as using them in rescoring.

Highlights

  • Applying neural networks to statistical machine translation has been gaining increasing attention recently

  • These improvements are at least as good as those of rescoring. This applies both for the exact bidirectional translation model (BTM) as well as the approximate language model (LM) and joint model (JM) cases

  • The results show that recurrent neural network (RNN) LM rescoring can be improved if decoding is performed including the RNN LM

Read more

Summary

Introduction

Applying neural networks to statistical machine translation has been gaining increasing attention recently. Neural networks were used for standalone decoding using a simple beam-search word-based decoder (Sutskever et al, 2014; Bahdanau et al, 2015) Another approach is to apply neural models directly in a phrase-based decoder. We focus on this approach, which is challenging since phrase-based decoding typically involves generating tens or even hundreds of millions of partial hypotheses. Scoring such a number of hypotheses using neural models is expensive, mainly due to the usually large output layer. Decoder integration has been done in (Vaswani et al, 2013) for feedforward neural language models. Devlin et al (2014) integrate feedforward translation models into phrase-based decoding reporting major improvements, which highlight the strength of the underlying models

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call