Research on the LSTM Mongolian and Chinese machine translation based on morpheme encoding

Ren Qing-Dao-Er-Ji,Yi La Su,Wan Wan Liu

doi:10.1007/s00521-018-3741-5

Abstract

The neural machine translation model based on long short-term memory (LSTM) has become the mainstream in machine translation with its unique coding–decoding structure and semantic mining features. However, there are few studies on the Mongolian and Chinese neural machine translation combined with LSTM. This paper mainly studies the preprocessing of Mongolian and Chinese bilingual corpus and the construction of the LSTM model of Mongolian morpheme coding. In the corpus preprocessing stage, this paper presents a hybrid algorithm for the construction of word segmentation modules. The sequence that has not been annotated is treated semantically and labeled by a combination of gated recurrent unit and conditional random field. In order to learn more grammar and semantic knowledge from Mongolian corpus, in the model construction stage, this paper presents the LSTM neural network model based on morpheme coding to construct the encoder. This paper also constructs the LSTM neural network decoder to predict the Chinese decode. Experimental comparisons of sentences of different lengths according to the construction model show that the model has improved translation performance in dealing with long-term dependence problems.

Full Text