Handling Unknown Words in Neural Machine Translation System

Kamal Deep Garg,Jatin Gupta,Vandana Saini

doi:10.1109/dasa51403.2020.9317169

Abstract

The corpus-based approach is an emerging approach to develop the machine translation system nowadays. Statistical Machine Translation(SMT) and Neural Machine Translation(NMT) are two corpus-based systems. NMT yields better results as compared to the traditional rule-based approach as well as a statistical-based approach. The computation complexity of the NMT system is more as compared to the SMT system due to the use of softmax function at the output layer of NMT. Due to the constraint of complexity, NMT uses fixed vocabulary, but Machine Translation (MT) is an open problem. This causes the out-of-vocabulary (OOV) in the predictions of the NMT system. To overcome these OOV words in NMT, Word Embedding (WE) has been used in Our NMT model for Punjabi to English. With WE, Byte-Pair-Encoding (BPE) has also been used to increase the effectiveness of the overall system. The system has been evaluated by using the automated evaluation tools BLEU score and Translation Error Rate (TER) score.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Handling Unknown Words in Neural Machine Translation System

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
Nghia-Luan Pham ... Van-Vinh Nguyen
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36
Nghia-Luan Pham, et. al.Nghia-Luan Pham ... Van-Vinh Nguyen
30 May 2020
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36

A Pragmatic Analysis of Machine Translation Techniques for Preserving the Authenticity of the Sanskrit Language
Nandini Sethi ... Deepak Gupta
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -
Nandini Sethi, et. al.Nandini Sethi ... Deepak Gupta
25 Jul 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -

Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems
Benjamin Marie ... Atsushi Fujita
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19
Benjamin Marie, et. al.Benjamin Marie ... Atsushi Fujita
01 Jun 2020
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Handling Unknown Words in Neural Machine Translation System

Abstract

Talk to us

Similar Papers