Hybrid System Combination Framework for Uyghur–Chinese Machine Translation

Yajuan Wang,Rui Dong,Yating Yang,Azmat Anwar,Xiao Li

doi:10.3390/info12030098

Abstract

Both the statistical machine translation (SMT) model and neural machine translation (NMT) model are the representative models in Uyghur–Chinese machine translation tasks with their own merits. Thus, it will be a promising direction to combine the advantages of them to further improve the translation performance. In this paper, we present a hybrid framework of developing a system combination for a Uyghur–Chinese machine translation task that works in three layers to achieve better translation results. In the first layer, we construct various machine translation systems including SMT and NMT. In the second layer, the outputs of multiple systems are combined to leverage the advantage of SMT and NMT models by using a multi-source-based system combination approach and the voting-based system combination approaches. Moreover, instead of selecting an individual system’s combined outputs as the final results, we transmit the outputs of the first layer and the second layer into the final layer to make a better prediction. Experiment results on the Uyghur–Chinese translation task show that the proposed framework can significantly outperform the baseline systems in terms of both the accuracy and fluency, which achieves a better performance by 1.75 BLEU points compared with the best individual system and by 0.66 BLEU points compared with the conventional system combination methods, respectively.

Highlights

Published: 25 February 2021Machine translation (MT) is an important task for the natural language processing (NLP) field
The most popular Uyghur–Chinese machine translation methods can be divided into two categories: statistical machine translation (SMT) [1,2] and neural machine translation (NMT) [3,4,5]
Whereas distributed representation of arbitrary language can be realized through the end-to-end training of the NMT system, the NMT model can prevent the problems during the SMT training process

Summary

Introduction

Machine translation (MT) is an important task for the natural language processing (NLP) field. The most popular Uyghur–Chinese machine translation methods can be divided into two categories: statistical machine translation (SMT) [1,2] and neural machine translation (NMT) [3,4,5]. There are other reasons for SMT to be used since SMT models require large datasets and have the most problems with rare words. Its workflow must be ordered to execute by multiple separately tuned components, such as word alignment, translation rules extractors, and other feature extractors, which seems to be more complex and will bring error propagation problems in the training pipeline [7]. Whereas distributed representation of arbitrary language can be realized through the end-to-end training of the NMT system, the NMT model can prevent the problems during the SMT training process. NMT can produce more fluent results [8,9,10]

Methods

Results

Conclusion