Abstract

Machine translation is one of the most popular and hardest tasks in Natural Language Processing. This paper proposes a self-attention based model for machine translation, named Re-Transformer, by transforming the Transformer [1]. Different from prevailing approach through module or GPU stacking to improve the system performance, Re-Transformer modifies the basic architecture; there are four refinements as follows. First, Re-Transformer adopts sub-word Tokenization in corpus preprocessing to overcome rare words. In the encoder layer, dual Self-Attention stacks and less Point Wise Feed Forward layer are adopted to obtain better comprehension of an input sentence. In decoder, reduced the stack of “Decoder” are used to speedup training and inferring speed meanwhile keeps the same level of BLEU. The experiment results show that Re-Transformer with BLEU metric score 31.36, 38.45 (four-layer decoder) and 32.14, 55.62 (two-layer decoder) improves around 4 and 17 points of BLEU metric against the Transformer over the WMT 2014 English-German and English-French Translation Corpus.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.