Hierarchical Transfer Learning Architecture for Low-Resource Neural Machine Translation

Gongxu Luo,Zhanheng Chen,Yang Yuan,Yating Yang,Aizimaiti Ainiwaer

doi:10.1109/access.2019.2936002

Abstract

Neural Machine Translation(NMT) has achieved notable results in high-resource languages, but still works poorly on low-resource languages. As times goes on, It is widely recognized that transfer learning methods are effective for low-resource language problems. However, existing transfer learning methods are typically based on the parent-child architecture, which does not adequately take advantages of helpful languages. In this paper, inspired by human transitive inference and learning ability, we handle this issue by proposing a new hierarchical transfer learning architecture for low-resource languages. In the architecture, the NMT model is trained in the unrelated high-resource language pair, the similar intermediate language pair and the low-resource language pair in turn. Correspondingly, the parameters are transferred and fine-tuned layer by layer for initialization. In this way, our hierarchical transfer learning architecture simultaneously combines the data volume advantages of high-resource languages and the syntactic similarity advantages of cognate languages. Specially, we utilize Byte Pair Encoding(BPE) and character-level embedding for data pre-processing, which effectively solve the problem of out of vocabulary(OOV). Experimental results on Uygur-Chinese and Turkish-English translation demonstrate the superiorities of the proposed architecture over the NMT model with parent-child architecture.

Highlights

Language is the most important human communication tools and the main way of expression for people to communicate [1]
We can find that our hierarchical transfer learning architecture improves 1.95 BLEU scores compared with the Phrase-Based SMT system, improves 1.15 BLEU scores compared with the transformer-big model and improves 0.58 BLEU scores compared with the parentchild architecture based on the Transformer-Big model
The hierarchical transfer learning architecture applies transfer learning method by setting the same hyperparameters to maintain the consistency of the model structure

Summary

INTRODUCTION

Language is the most important human communication tools and the main way of expression for people to communicate [1]. Because of the complexity of the network and large number of parameters, the NMT models are highly depended on the quality and the availability of extensive parallel corpora For this reason, NMT models still perform poorly on most low-resource languages compared with the Statistical Machine Translation(SMT) [9], [10]. Our contributions are as follows: 1) We propose a new hierarchical transfer learning architecture to combine data volume advantage of highresource languages with syntactic similarity advantage of similar languages by adding a intermediate layer. 3) we verify the generalization of the hierarchical transfer learning architecture by experimenting it on different low-resource languages. 4) Experimental results show that our architecture significantly improves the translation performance compared with the parent-child architecture, the NMT system based on transformer-big model and the phrase-based SMT model on low-resource languages.

RELATED WORK

RESULTS AND ANALYSIS

CONCLUSION