Abstract

Many errors in phrase-based SMT can be attributed to problems on three linguistic levels: morphological complexity in the target language, structural differences and lexical choice. We explore combinations of linguistically motivated approaches to address these problems in English-to-German SMT and show that they are complementary to one another, but also that the popular verbal pre-ordering can cause problems on the morphological and lexical level. A discriminative classifier can overcome these problems, in particular when enriching standard lexical features with features geared towards verbal inflection.

Highlights

  • Introduction and MotivationMany of the errors occurring in SMT can be attributed to problems on three linguistic levels: morphological richness, structural differences between source and target language, and lexical choice

  • We explore system variants that combine target-side morphological modeling, structural adaptation between source and target side and a discriminative lexicon enriched with features relevant for support verb constructions and verbal inflection

  • We show that the components targeting the different linguistic levels are complementary, and that applying only verbal pre-ordering can introduce problems on the morpho-lexical level; our experiments indicate that a discriminative classifier can overcome these problems

Read more

Summary

Introduction and Motivation

Many of the errors occurring in SMT can be attributed to problems on three linguistic levels: morphological richness, structural differences between source and target language, and lexical choice. Often, these categories are intertwined: for example, the syntactic function of an argument can be expressed on the morphological level by grammatical case (e.g. in German), or on the syntactic level through word ordering (such as SVO in English). Combining Approaches Individual strategies aiming at one linguistic level are established and usually improve translation, but it is not clear (i) whether individual gains add up when combining approaches and (ii) how individually targeting one linguistic level impacts other levels We address these questions for the combined strategies of source-side reordering (pre-processing), discriminative classifier (at decoding time) and target-side generation of nominal inflection (post-processing). With too large a gap between cut and interest rates, it becomes difficult to disambiguate cut, leading to the wrong translation schneiden (’to cut with a knife’)

Morpho-Syntactic Modeling
Context Features for Lexical Modeling
Experiments and Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.