Research on Uyghur-Chinese Neural Machine Translation Based on the Transformer at Multistrategy Segmentation Granularity

Zhiwang Xu,Huibin Qin,Yongzhu Hua,Sang-Bing Tsai

doi:10.1155/2021/5744248

Zhiwang Xu, Huibin Qin + Show 2 more

Open Access

https://doi.org/10.1155/2021/5744248

Copy DOI

Abstract

In recent years, machine translation based on neural networks has become the mainstream method in the field of machine translation, but there are still challenges of insufficient parallel corpus and sparse data in the field of low resource translation. Existing machine translation models are usually trained on word-granularity segmentation datasets. However, different segmentation granularities contain different grammatical and semantic features and information. Only considering word granularity will restrict the efficient training of neural machine translation systems. Aiming at the problem of data sparseness caused by the lack of Uyghur-Chinese parallel corpus and complex Uyghur morphology, this paper proposes a multistrategy segmentation granular training method for syllables, marked syllable, words, and syllable word fusion and targets traditional recurrent neural networks and convolutional neural networks; the disadvantage of the network is to build a Transformer Uyghur-Chinese Neural Machine Translation model based entirely on the multihead self-attention mechanism. In CCMT2019, dimension results on Uyghur-Chinese bilingual datasets show that the effect of multiple translation granularity training method is significantly better than the rest of granularity segmentation translation systems, while the Transformer model can obtain higher BLEU value than Uyghur-Chinese translation model based on Self-Attention-RNN.

Highlights

Machine translation can be divided into rule-based machine translation, instance-based machine translation, statistics-based machine translation, and neural networkbased machine translation [1]. Both statistical machine translation and neural network machine translation rely on large-scale bilingual parallel corpus. e Transformer [2] model used in this paper has a good translation effect in resource-rich languages, but in the series of small language translation tasks such as Uyghur, there is a problem of insufficient parallel corpus, which is difficult to meet the training needs of the Transformer model
Transformer Model e Transformer model depends on the attention mechanism and uses encoder-decoder architecture, but its structure is more complicated than attention. e encoding end is composed of 6 encoders stacked together, and the decoding end is the same
We study the training method of multisegment granularity for the Uyghur-Chinese translation with scarce resources. rough syllables, words, and syllable word fusion, it can effectively solve the problem of translation of prepositions and conjunctions from Chinese language but not in Uyghur and avoid translation difficulties at the vocabulary and syntactic level

Summary

Introduction

Both statistical machine translation and neural network machine translation rely on large-scale bilingual parallel corpus. e Transformer [2] model used in this paper has a good translation effect in resource-rich languages, but in the series of small language translation tasks such as Uyghur, there is a problem of insufficient parallel corpus, which is difficult to meet the training needs of the Transformer model. E encoder on the left of Figure 1 is composed of a multihead attention network and a simple fully connected feed-forward neural network. Each layer is composed of two sublayers, namely multihead self-attention mechanism and fully connected feedforward network. E encoder and decoder attention to translate and align is used, and both the encoder and the decoder use multihead self-attention to learn the representation of the text [6, 7] When calculating attention, it is mainly divided into three steps [8]: first, the query and key are used to calculate the similarity weight; second, the Softmax function is used to normalize; third, the weight and the corresponding key value are used to value weighted summation. In the formula, x represents the input, W1 represents the parameter matrix of the first linear transformation, b1 represents the offset vector of the first linear transformation, W2 represents the parameter matrix of the second linear transformation, and b2 represents the offset vector of the second linear transformation [14]

Multigranularity Segmentation

Test and Result Analysis

Findings

Conclusion