Abstract

We present a novel scheme to combine neural machine translation (NMT) with traditional statistical machine translation (SMT). Our approach borrows ideas from linearised lattice minimum Bayes-risk decoding for SMT. The NMT score is combined with the Bayes-risk of the translation according the SMT lattice. This makes our approach much more flexible than n-best list or lattice rescoring as the neural decoder is not restricted to the SMT search space. We show an efficient and simple way to integrate risk estimation into the NMT decoder which is suitable for word-level as well as subword-unit-level NMT. We test our method on English-German and Japanese-English and report significant gains over lattice rescoring on several data sets for both single and ensembled NMT. The MBR decoder produces entirely new hypotheses far beyond simply rescoring the SMT search space or fixing UNKs in the NMT output.

Highlights

  • Lattice minimum Bayes-risk (LMBR) decoding has been applied successfully to translation lattices in traditional statistical machine translation (SMT) to improve translation performance of a single system (Kumar and Byrne, 2004; Tromble et al, 2008; Blackwood et al, 2010)

  • We show how to reformulate the original LMBR decision rule for using it in a word-based neural machine translation (NMT) decoder which is not restricted to an n-best list or a lattice

  • We propose to collect statistics for MBR from a potentially large translation lattice generated with SMT, and use the n-gram posteriors as additional score in NMT decoding

Read more

Summary

Introduction

Lattice minimum Bayes-risk (LMBR) decoding has been applied successfully to translation lattices in traditional SMT to improve translation performance of a single system (Kumar and Byrne, 2004; Tromble et al, 2008; Blackwood et al, 2010). Minimum Bayes-risk (MBR) decoding is a very powerful framework for combining diverse systems (Sim et al, 2007; de Gispert et al, 2009). We study combining traditional SMT and NMT in a hybrid decoding scheme based on MBR. We argue that MBR-based methods in their present form are not well-suited for NMT because of the following reasons:. NMT decoding usually relies on beam search with a limited beam and produces very narrow lattices (Li and Jurafsky, 2016; Vijayakumar et al, 2016). It is difficult to collect the statistics needed for risk calculation for NMT

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.