Abstract

Nearly all of the work in neural machine translation (NMT) is limited to a quite restricted vocabulary, crudely treating all other words the same as an < unk > symbol. For the translation of language with abundant morphology, unknown (UNK) words also come from the misunderstanding of the translation model to the morphological changes. In this study, we explore two ways to alleviate the UNK problem in NMT: a new generative adversarial network (added value constraints and semantic enhancement) and a preprocessing technique that mixes morphological noise. The training process is like a win-win game in which the players are three adversarial sub models (generator, filter, and discriminator). In this game, the filter is to emphasize the discriminator’s attention to the negative generations that contain noise and improve the training efficiency. Finally, the discriminator cannot easily discriminate the negative samples generated by the generator with filter and human translations. The experimental results show that the proposed method significantly improves over several strong baseline models across various language pairs and the newly emerged Mongolian-Chinese task is state-of-the-art.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call