Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation

Rico Sennrich

doi:10.1162/tacl_a_00131

Abstract

The role of language models in SMT is to promote fluent translation output, but traditional n-gram language models are unable to capture fluency phenomena between distant words, such as some morphological agreement phenomena, subcategorisation, and syntactic collocations with string-level gaps. Syntactic language models have the potential to fill this modelling gap. We propose a language model for dependency structures that is relational rather than configurational and thus particularly suited for languages with a (relatively) free word order. It is trainable with Neural Networks, and not only improves over standard n-gram language models, but also outperforms related syntactic language models. We empirically demonstrate its effectiveness in terms of perplexity and as a feature function in string-to-tree SMT from English to German and Russian. We also show that using a syntactic evaluation metric to tune the log-linear parameters of an SMT system further increases translation quality when coupled with a syntactic language model.

Highlights

Many languages exhibit fluency phenomena that are discontinuous in the surface string, and are not modelled well by traditional n-gram language models
While all these aspects are important for successfully applying a syntactic language model, our primary contributions are a novel dependency language model which improves over prior work by making relational modelling assumptions, which we argue are better suited for languages with a free word order, and the use of a syntactic evaluation metric for optimizing the loglinear parameters of the SMT model
The dependency language models all show a preference for the reference translation, with DLM having a stronger preference than the model by Shen et al (2010), and RDLM having the strongest preference

Summary

Introduction

Many languages exhibit fluency phenomena that are discontinuous in the surface string, and are not modelled well by traditional n-gram language models. Syntactic language models try to overcome the limitation to a local n-gram context by using syntactically related words (and non-terminals) as context information Despite their theoretical attractiveness, it has proven difficult to improve SMT with parsers as language models (Och et al, 2004; Post and Gildea, 2008). This paper describes an effective method to model, train, decode with, and weight a syntactic language model for SMT While all these aspects are important for successfully applying a syntactic language model, our primary contributions are a novel dependency language model which improves over prior work by making relational modelling assumptions, which we argue are better suited for languages with a (relatively) free word order, and the use of a syntactic evaluation metric for optimizing the loglinear parameters of the SMT model

Objectives

Results

Conclusion