Reduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language

Mirjam Sepesy Maučec,Janez Brest

doi:10.15388/informatica.2010.275

Abstract

We address the problem of statistical machine translation from highly inflective language to less inflective one. The characteristics of inflective languages are generally not taken into account by the statistical machine translation system. Existing translation systems often treat different inflected word forms of the same lemma as if they were independent of each other, although some interdependencies exist. On the other hand we know that if we reduce inflected word forms to common lemmas, some information is lost. It would be reasonable to eliminate only the variations in inflected word forms, which are not relevant for translation. Inflectional features of words are defined by morpho-syntactic descriptions (MSD) tags and we want reduce them. To do this the explicit knowledge about both languages (source and target language) is needed. The idea of the paper is to find the information-bearing MSDs in source language by data-driven approach. The task is performed by a global optimization algorithm, named Differential Evolution. The experiments were performed using freely available parallel English-Slovenian corpus SVEZ-IJS, which is lemmatized and annotated with MSD tags. The results show a promising direction toward optimal subset of morpho-syntactic features.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language

Abstract

Talk to us

Similar Papers

More From: Informatica

Lead the way for us

Journal: Informatica	Publication Date: Jan 1, 2010
Citations: 23

Similar Papers

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

Towards incorporating language morphology into statistical machine translation systems
P Karageorgakis ... A Potamianos
-
P Karageorgakis, et. al.P Karageorgakis ... A Potamianos
01 Jan 2004
01 Jan 2004

Training, Enhancing, Evaluating and Using MT Systems with Comparable Data
Bogdan Babych ... Mārcis Pinnis
-
Bogdan Babych, et. al.Bogdan Babych ... Mārcis Pinnis
01 Jan 2019
01 Jan 2019

Statistical vs. Rule-Based Machine Translation: A Comparative Study on Indian Languages
S Sreelekha ... Pushpak Bhattacharyya
-
S Sreelekha, et. al.S Sreelekha ... Pushpak Bhattacharyya
28 Dec 2017
28 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language

Abstract

Talk to us

Similar Papers

More From: Informatica