Improving Statistical Machine Translation using Syntax-based Learning-to-Rank System

Saeed Farzi,Heshaam Faili

doi:10.1093/llc/fqv032

Abstract

Word reordering is one of the fundamental problems of machine translation. It is an important factor in the quality and efficiency of machine translations. Tackling the reordering problem can lead to significant improvements in translation quality. A new method is introduced with the objective of solving the reordering problem. It exploits sophisticated syntactic-based features for re-ranking the n -best translation candidates provided by a phrase-based statistical machine translation system. These sophisticated reordering features are based on an innovative structure named the phrasal dependency tree, which is inspired from target-side dependency relations among contiguous non-syntactic phrases. The features benefit from phrase dependencies, translation orientation, and distance. A translation candidate is modelled as a directed and weighted graph built from information provided by the reordering features and is re-scored by the proposed re-ranking system. This system markedly improves the outputs of the machine translation of two syntactically divergent language pairs. The performance is evaluated for Persian→English and German→English translation tasks using the WMT07 benchmark. The results report 0.566/0.95/0.011- and 0.75/0.97/0.024-point improvements in terms of BLEU/TER/LRSCORE metrics on Persian→English and German→English translation tasks, respectively. The superiority of the proposed system in terms of precision and recall measures is demonstrated as well.

Full Text