POS-Tagging based Neural Machine Translation System for European Languages using Transformers

Preetham Ganesh,Alexander Peter,Bharat S Rawal,Andi Giri

doi:10.37394/23209.2021.18.5

Abstract

The interaction between human beings has always faced different kinds of difficulties. One of those difficulties is the language barrier. It would be a tedious task for someone to learn all the syllables in a new language in a short period and converse with a native speaker without grammatical errors. Moreover, having a language translator at all times would be intrusive and expensive. We propose a novel approach to Neural Machine Translation (NMT) system using interlanguage word similaritybased model training and Part-Of-Speech (POS) Tagging based model testing. We compare these approaches using two classical architectures: Luong Attention-based Sequence-to-Sequence architecture and Transformer based model. The sentences for the Luong Attention-based Sequence-to-Sequence were tokenized using SentencePiece tokenizer. The sentences for the Transformer model were tokenized using Subword Text Encoder. Three European languages were selected for modeling, namely, Spanish, French, and German. The datasets were downloaded from multiple sources such as Europarl Corpus, Paracrawl Corpus, and Tatoeba Project Corpus. Sparse Categorical CrossEntropy was the evaluation metric during the training stage, and during the testing stage, the Bilingual Evaluation Understudy (BLEU) Score, Precision Score, and Metric for Evaluation of Translation with Explicit Ordering (METEOR) score were the evaluation metrics.

Full Text