Abstract

In this paper, we investigate the role of morphology in phrase-based statistical machine translation from English to highly inflectional Slovenian language. Translation to inflectional language is a challenging task, because of its morphological complexity.Rich morphology increases data sparsity and worsens the quality of statistical machine translation.To address this issue, we added the morphological information in terms of MSD tags, that were attached to words. MSD tag includes all morphosyntactic informationin position-dependent attributes. Tags were attached to words by TreeTagger. Several experiments were performed using MSD tags to improve the translation results.First, factored translation was studied. Different configurations were tested. They show that factored translation improves modeling of short distance collocations. To capture long-distance dependencies in languages, OSM models were added in the second set of experiments. Additional improvement was obtained.The overall results show that morphosyntactic information of inflectional language is an important factor in translation. Factored translation with OSM modelsbrought 9% relative improvement.The conclusions of our work can be generalized to other Balto-Slavic languages, as they share to some extend the same morphological characteristics. DOI: http://dx.doi.org/10.5755/j01.itc.47.1.17887

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.