Boosting the Transformer with the BERT Supervision in Low-Resource Machine Translation

Rong Yan,Xiaoming Wang,Guanglai Gao,Xiangdong Su,Jiang Li

doi:10.3390/app12147195

Rong Yan, Xiaoming Wang + Show 3 more

Open Access

https://doi.org/10.3390/app12147195

Copy DOI

Journal: Applied Sciences	Publication Date: Jul 17, 2022
Citations: 6	License type: CC BY 4.0

Affiliation: Inner Mongolia University

Abstract

Previous works trained the Transformer and its variants end-to-end and achieved remarkable translation performance when there are huge parallel sentences available. However, these models suffer from the data scarcity problem in low-resource machine translation tasks. To deal with the mismatch problem between the big model capacity of the Transformer and the small parallel training data set, this paper adds the BERT supervision on the latent representation between the encoder and the decoder of the Transformer and designs a multi-step training algorithm to boost the Transformer on such a basis. The algorithm includes three stages: (1) encoder training, (2) decoder training, and (3) joint optimization. We introduce the BERT of the target language in the encoder and the decoder training and alleviate the data starvation problem of the Transformer. After the training stage, the BERT will not further attend the inference section explicitly. Another merit of our training algorithm is that it can further enhance the Transformer in the task where there are limited parallel sentence pairs but large amounts of monolingual corpus of the target language. The evaluation results on six low-resource translation tasks suggest that the Transformer trained by our algorithm significantly outperforms the baselines which were trained end-to-end in previous works.

Full Text