Abstract

When translating between two languages that differ in their degree of morphological synthesis, syntactic structures in one language may be realized as morphological structures in the other, and SMT models need a mechanism to learn such translations. Prior work has used morpheme splitting with flat representations that do not encode the hierarchical structure between morphemes, but this structure is relevant for learning morphosyntactic constraints and selectional preferences. We propose to model syntactic and morphological structure jointly in a dependency translation model, allowing the system to generalize to the level of morphemes. We present a dependency representation of German compounds and particle verbs that results in improvements in translation quality of 1.4‐1.8 BLEU in the WMT English‐German translation task.

Highlights

  • When translating between two languages that differ in their degree of morphological synthesis, syntactic structures in one language may be realized as morphological structures in the other

  • If we only evaluate the sentences containing a particle verb with zu-infix in the reference, 165 in total for newstest2014/5, we observe an improvement of 0.8 BLEU on this subset (22.1→22.9), significant with p < 0.05

  • Our main contribution is that we exploit the hierarchical structure of morphemes to model them jointly with syntax in a dependency-based stringto-tree SMT model

Read more

Summary

Introduction

When translating between two languages that differ in their degree of morphological synthesis, syntactic structures in one language may be realized as morphological structures in the other. Compounds in Germanic languages are head-final, and the head is the segment that determines agreement within the noun phrase, and is relevant for selectional preferences of verbs. English/German example he walks away quickly er geht schnell weg [...] because he walks away quickly [...] weil er schnell weggeht he can walk away quickly er kann schnell weggehen he promises to walk away quickly er verspricht, schnell wegzugehen. We focus on the representation of German syntax and morphology in an English-to-German system, and two morphologically complex word classes in German that are challenging for translation, compounds and particle verbs.

A Dependency Representation of Compounds and Particle Verbs
Tree Binarization
Post-Processing
Data and Models
SMT Results
Synthetic LM Experiment
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call