Factored Phrase-based Statistical Machine Pre-training with Extended Transformers

Vivien L Beyala,Marcellin J,Perrin Li

doi:10.14569/ijacsa.2020.0110907

Abstract

This paper presents the development of a cascaded hybrid multi- lingual automatic translation system, by allowing a tight coupling between the two underlying research approach in machine translation, namely, the neuronal (deterministic approach) and statistical (probabilistic approach), while fully taking advantage of each method in order to improve translation performance. This architecture addresses two major problems frequently occurring when dealing with morphologically richer languages in MT, that is, the significant number unknown tokens generated due to the presence of out of vocabulary (OOV) words, and size of the output vocabulary. Additionally, we incorporated factors (additional word-level linguistic information) in order to alleviate data sparseness problem or potentially reduce language ambiguity, the factors we considered are lemmatization and Part-of-Speech tags (taking into consideration its various compounds). We combined a fully-factored transformer and a factored PB-SMT, where, the training data is pre-translated using the trained fully-factored transformer, and afterwards employed to build an PB-SMT system, parallelly using the pre-translated development set to tune parameters. Finally, in order to produce the desired results, we operated the FPB-SMT system to re-decode the pre-translated test set in a post-processing step. Experiments performed on translations from Japanese to English and English to Japanese reveals that our proposed cascaded hybrid framework outperforms the strong HMT state-of-the-art by over 8.61% BLEU and 7.25% BLEU, respectively, for validation set, and over 8.70% BLEU and 7.70% BLEU, respectively, for test set.

Highlights

Machine translation has known an improvement in the state-of-the-art performance by the intervention of Transformers [1] which is a new paradigm in Neural Machine Translation (NMT) [2] [3] powered by frameworks of sequence to sequence learning, rivaling since the factored statistical machine translation paradigm [4] which has achieved the state-of-the-art in SMT frameworks [5] [6]
Our generalized model supports an arbitrary number of input features. It is on a number of well-known linguistic features that we focused in this paper, having as empirical question of knowing to which extend does providing linguistic features to both encoder and decoder improves the translation quality more specially in morphologically richer languages when using the transformer paradigm
We have proposed a novel HMT framework cascaded as a Fully-Factored Transformer ⇒ Factored SMT pipeline consisting of integrated linguistic factors at both the source language and target language of the transformer model, and linguistic factors at source language of the SMT model

Summary

Introduction

Machine translation has known an improvement in the state-of-the-art performance by the intervention of Transformers [1] which is a new paradigm in Neural Machine Translation (NMT) [2] [3] powered by frameworks of sequence to sequence learning, rivaling since the factored statistical machine translation paradigm [4] which has achieved the state-of-the-art in SMT frameworks [5] [6]. For low-resource (or small corpus) and morphologically rich language conditions, the necessity to incorporate for the surface level words various linguistic annotations was found to resolve semantic ambiguities and data sparseness, leading to better translation of rare words or OOVs and greater generalization capacity as illustrated [4] when addressing this issue for the traditional SMT architecture [7] by proposing the factored translation model. This linguistic annotations or factors include features such as lemmas, stems, morphological classes, roots, data-driven clusters, data-driven clusters, part-of-speeches, constituency parsing and compounds. With the vision of alleviating data sparseness and reducing language ambiguity, such extra features may be of enormous benefits when added to both NMT and Phrase-based SMT frameworks

Objectives

Methods

Conclusion