Abstract

Viewing machine translation (MT) as a structured classification problem has provided a gateway for a host of structured prediction techniques to enter the field. In particular, large-margin methods for discriminative training of feature weights, such as the structured perceptron or MIRA, have started to match or exceed the performance of existing methods such as MERT. One issue with these problems in general is the difficulty in obtaining fully structured labels, e.g. in MT, obtaining reference translations or parallel sentence corpora for arbitrary language pairs. Another issue, more specific to the translation domain, is the difficulty in online training and updating of MT systems, since existing methods often require bilingual knowledge to correct translation outputs online. The problem is an important one, especially with the usage of MT in the mobile domain: in the process of translating user inputs, these systems can also receive feedback from the user on the quality of the translations produced. We propose a solution to these two problems, by demonstrating a principled way to incorporate binary-labeled feedback (i.e. feedback on whether a translation hypothesis is a "good" or understandable one or not), a form of supervision that can be easily integrated in an online and monolingual manner, into an MT framework. Experimental results on Chinese---English and Arabic---English corpora for both sparse and dense feature sets show marked improvements by incorporating binary feedback on unseen test data, with gains in some cases exceeding 5.5 BLEU points. Experiments with human evaluators providing feedback present reasonable correspondence with the larger-scale, synthetic experiments and underline the relative ease by which binary feedback for translation hypotheses can be collected, in comparison to parallel data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call