Abstract

Machine translation based on neural networks has been shown to produce superior results, compared with other approaches. To build an efficient neural machine translation (NMT) system, it is essential to have an accurate and massive bilingual corpus for training, and ensure the continuous improvement of the methods and techniques used in the translation system. Despite multiple advantages, one challenging issue for current neural network translation system is long sentence processing [1]. In this paper, we propose a method to extract bilingual phrases to build a phrase-aligned bilingual corpus, and the implementation of a long sentence preprocessing technique to be used in the neural machine translation model. Experimental training of the neural machine translation system to translate Vietnamese into English using our proposed technique shows an improvement in BLEU scores.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call