Abstract

Building the first Russian-Vietnamese neural machine translation system, we faced the problem of choosing a translation unit system on which source and target embeddings are based. Available homogeneous translation unit systems with the same translation unit on the source and target sides do not perfectly suit the investigated language pair. To solve the problem, in this paper, we propose a novel heterogeneous translation unit system, considering linguistic characteristics of the synthetic Russian language and the analytic Vietnamese language. Specifically, we decrease the embedding level on the source side by splitting token into subtokens and increase the embedding level on the target side by merging neighboring tokens into supertoken. The experiment results show that the proposed heterogeneous system improves over the existing best homogeneous Russian-Vietnamese translation system by 1.17 BLEU. Our approach could be applied to building translation bots for language pairs with different linguistic characteristics.

Highlights

  • No researchers have addressed the problem of neural machine translation for the Russian-Vietnamese language pair.e primary aim of our work is to study neural machine translation for this language pair. erefore, we attempt to build and analyze Russian-Vietnamese neural machine translation systems

  • We observe the tendency that on the Russian source side, subtoken is favorable to token as the translation unit. e average BLEU score by neural machine translation (NMT) models with token as the translation unit on the Russian source side is (34.45 + 33.44 + 31.50)/4 33.13 BLEU

  • Based on our linguistic understanding of morphologically rich Russian language and analytic noninflectional Vietnamese language, we propose a novel mixed-level model for translating from Russian to Vietnamese. e mixed-level model uses subtokens as the input and supertokens as the output

Read more

Summary

Introduction

No researchers have addressed the problem of neural machine translation for the Russian-Vietnamese language pair.e primary aim of our work is to study neural machine translation for this language pair. erefore, we attempt to build and analyze Russian-Vietnamese neural machine translation systems. No researchers have addressed the problem of neural machine translation for the Russian-Vietnamese language pair. E primary aim of our work is to study neural machine translation for this language pair. Erefore, we attempt to build and analyze Russian-Vietnamese neural machine translation systems. One of the first problems we faced when building a Russian-Vietnamese neural machine translation (NMT) system is to choose a suitable embedding level. Ere are different translation unit systems, on which the embedding vectors are based. We use two terminology systems of the translation unit from the technical and linguistic points of view. Technical terminology is applied for both Russian source and Vietnamese target languages. Linguistic classifications are different for these languages.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call