Abstract

In current research, most tree-based translation models are built directly from parse trees. In this study, we go in another direction and build a translation model with an unsupervised tree structure derived from a novel non-parametric Bayesian model. In the model, we utilize synchronous tree substitution grammars (STSG) to capture the bilingual mapping between language pairs. To train the model efficiently, we develop a Gibbs sampler with three novel Gibbs operators. The sampler is capable of exploring the infinite space of tree structures by performing local changes on the tree nodes. Experimental results show that the string-to-tree translation system using our Bayesian tree structures significantly outperforms the strong baseline string-to-tree system using parse trees.

Highlights

  • In recent years, tree-based translation models1 are drawing more and more attention in the community of statistical machine translation (SMT)

  • Based on the above analysis, we can conclude that the tree structure that is independent from Tree-bank resources and simultaneously considers the bilingual mapping inside the bilingual sentence pairs would be a good choice for building treebased translation models

  • We can validate our former analysis that the unsupervised tree (U-tree) generated by our Bayesian tree induction model are more appropriate for string-to-tree translation than parse trees

Read more

Summary

Introduction

Tree-based translation models are drawing more and more attention in the community of statistical machine translation (SMT). 2) Parse trees are only used to model and explain the monolingual structure, rather than the bilingual mapping between language pairs. This indicates that parse trees are usually not the optimal choice for training tree-based translation models (Wang et al, 2010). Based on the above analysis, we can conclude that the tree structure that is independent from Tree-bank resources and simultaneously considers the bilingual mapping inside the bilingual sentence pairs would be a good choice for building treebased translation models

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call