Evaluation of English–Slovak Neural and Statistical Machine Translation

Lucia Benkova,Dasa Munkova,Michal Munk,Ľubomír Benko

doi:10.3390/app11072948

Abstract

This study is focused on the comparison of phrase-based statistical machine translation (SMT) systems and neural machine translation (NMT) systems using automatic metrics for translation quality evaluation for the language pair of English and Slovak. As the statistical approach is the predecessor of neural machine translation, it was assumed that the neural network approach would generate results with a better quality. An experiment was performed using residuals to compare the scores of automatic metrics of the accuracy (BLEU_n) of the statistical machine translation with those of the neural machine translation. The results showed that the assumption of better neural machine translation quality regardless of the system used was confirmed. There were statistically significant differences between the SMT and NMT in favor of the NMT based on all BLEU_n scores. The neural machine translation achieved a better quality of translation of journalistic texts from English into Slovak, regardless of if it was a system trained on general texts, such as Google Translate, or specific ones, such as the European Commission’s (EC’s) tool, which was trained on a specific-domain.

Highlights

Machine translation (MT) is a sub-field of computational linguistics that primarily focuses on automatic translation from one natural language into another natural language without any intervention [1].Neural machine translation (NMT) is an approach that is used by many online translation services, such as Google Translate, Bing, Systran, and eTranslation
From multiple comparisons (Bonferroni adjustment) in the case of BLEU_1 (Table 2a), there is a significant difference between NMT (GT_NMT or eTranslation_NMT) and GT_SMT, as well as mt@ec_SMT (p < 0.001), in favor of NMT (Figure 1a)
Our research aimed to establish whether machine translation based on neural networks achieves a higher quality than its predecessor, statistical machine translation, in terms of translation accuracy

Summary

Introduction

Machine translation (MT) is a sub-field of computational linguistics that primarily focuses on automatic translation from one natural language into another natural language without any intervention [1].Neural machine translation (NMT) is an approach that is used by many online translation services, such as Google Translate, Bing, Systran, and eTranslation. It uses a deep neural network to process huge amounts of data, and is primarily dependent on training data, from which it learns. To predict the probability of a word sequence, it is necessary to use a neural network that can remember the previous sequence of words in a sentence. Feedforward neural networks (FNNs) process inputs independently from the rest of the sentence. The encoder is used to process the source sentence, which is read and encoded into a vector that captures the “meaning” of the input sequence. The decoder processes this vector to produce an output sequence in the target language

Objectives

Methods

Results

Discussion

Conclusion