Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars

Tong Xiao,Jingbo Zhu,Muhua Zhu

doi:10.1145/2025384.2025386

Abstract

The poor grammatical output of Machine Translation (MT) systems appeals syntax-based approaches within language modeling. However, previous studies showed that syntax-based language modeling using (Context-Free) Treebank Grammars was not very helpful in improving BLEU scores for Chinese-English machine translation. In this article we further study this issue in the context of Chinese-English syntax-based Statistical Machine Translation (SMT) where Synchronous Tree Substitution Grammars (STSGs) are utilized to model the translation process. In particular, we develop a Tree Substitution Grammar-based language model for syntax-based MT, and present three methods to efficiently integrate the proposed language model into MT decoding. In addition, we design a simple and effective method to adapt syntax-based language models for MT tasks. We demonstrate that the proposed methods are able to benefit a state-of-the-art syntax-based MT system. On the NIST Chinese-English MT evaluation corpora, we finally achieve an improvement of 0.6 BLEU points over the baseline.

Full Text