Generation of Compound Words in Statistical Machine Translation into Compounding Languages

Sara Stymne,Lars Ahrenberg,Nicola Cancedda

doi:10.1162/coli_a_00162

Abstract

In this article we investigate statistical machine translation (SMT) into Germanic languages, with a focus on compound processing. Our main goal is to enable the generation of novel compounds that have not been seen in the training data. We adopt a split-merge strategy, where compounds are split before training the SMT system, and merged after the translation step. This approach reduces sparsity in the training data, but runs the risk of placing translations of compound parts in non-consecutive positions. It also requires a postprocessing step of compound merging, where compounds are reconstructed in the translation output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order and show that it can lead to improvements both by direct inspection and in terms of standard translation evaluation metrics. We also propose several new methods for compound merging, based on heuristics and machine learning, which outperform previously suggested algorithms. These methods can produce novel compounds and a translation with at least the same overall quality as the baseline. For all subtasks we show that it is useful to include part-of-speech based information in the translation process, in order to handle compounds.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Generation of Compound Words in Statistical Machine Translation into Compounding Languages

Abstract

Talk to us

Similar Papers

More From: Computational Linguistics

Lead the way for us

Journal: Computational Linguistics	Publication Date: Jan 1, 2012
Citations: 24

Similar Papers

Statistical machine translation for Indic languages
Sudhansu Bala Das ... Tapas Kumar Mishra
Natural Language Processing | VOL. -
Sudhansu Bala Das, et. al.Sudhansu Bala Das ... Tapas Kumar Mishra
03 Jun 2024
Natural Language Processing | VOL. -

A unified approach in speech-to-speech translation
Ruiqiang Zhang ... Frank Soong
-
Ruiqiang Zhang, et. al.Ruiqiang Zhang ... Frank Soong
01 Jan 2004
01 Jan 2004

A product and process analysis of post-editor corrections on neural, statistical and rule-based machine translation output
Maarit Koponen ... Markku Nikulin
Machine Translation | VOL. 33
Maarit Koponen, et. al.Maarit Koponen ... Markku Nikulin
08 Mar 2019
Machine Translation | VOL. 33

ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks
Rohit Gupta ... Constantin Orasan
-
Rohit Gupta, et. al.Rohit Gupta ... Constantin Orasan
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generation of Compound Words in Statistical Machine Translation into Compounding Languages

Abstract

Talk to us

Similar Papers

More From: Computational Linguistics