Re-Transformer: A Self-Attention Based Model for Machine Translation

Huey-Ing Liu,Wei-Lin Chen

doi:10.1016/j.procs.2021.05.065

Huey-Ing Liu, Wei-Lin Chen

Open Access

https://doi.org/10.1016/j.procs.2021.05.065

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2021
Citations: 19	License type: cc-by-nc-nd

Affiliation: Fu Jen Catholic University

Abstract

Machine translation is one of the most popular and hardest tasks in Natural Language Processing. This paper proposes a self-attention based model for machine translation, named Re-Transformer, by transforming the Transformer [1]. Different from prevailing approach through module or GPU stacking to improve the system performance, Re-Transformer modifies the basic architecture; there are four refinements as follows. First, Re-Transformer adopts sub-word Tokenization in corpus preprocessing to overcome rare words. In the encoder layer, dual Self-Attention stacks and less Point Wise Feed Forward layer are adopted to obtain better comprehension of an input sentence. In decoder, reduced the stack of “Decoder” are used to speedup training and inferring speed meanwhile keeps the same level of BLEU. The experiment results show that Re-Transformer with BLEU metric score 31.36, 38.45 (four-layer decoder) and 32.14, 55.62 (two-layer decoder) improves around 4 and 17 points of BLEU metric against the Transformer over the WMT 2014 English-German and English-French Translation Corpus.

Full Text