Improving the Performance of Vietnamese–Korean Neural Machine Translation with Contextual Embedding

Van-Hai Vu,Cheol-Young Ock,Ebipatei Victoria Tunyan,Quang-Phuoc Nguyen

doi:10.3390/app112311119

Van-Hai Vu, Cheol-Young Ock + Show 2 more

Open Access

https://doi.org/10.3390/app112311119

Copy DOI

Journal: Applied Sciences	Publication Date: Nov 23, 2021
Citations: 1	License type: CC BY 4.0

Affiliation: University of Ulsan, Incheon Medical Center

Abstract

With the recent evolution of deep learning, machine translation (MT) models and systems are being steadily improved. However, research on MT in low-resource languages such as Vietnamese and Korean is still very limited. In recent years, a state-of-the-art context-based embedding model introduced by Google, bidirectional encoder representations for transformers (BERT), has begun to appear in the neural MT (NMT) models in different ways to enhance the accuracy of MT systems. The BERT model for Vietnamese has been developed and significantly improved in natural language processing (NLP) tasks, such as part-of-speech (POS), named-entity recognition, dependency parsing, and natural language inference. Our research experimented with applying the Vietnamese BERT model to provide POS tagging and morphological analysis (MA) for Vietnamese sentences,, and applying word-sense disambiguation (WSD) for Korean sentences in our Vietnamese–Korean bilingual corpus. In the Vietnamese–Korean NMT system, with contextual embedding, the BERT model for Vietnamese is concurrently connected to both encoder layers and decoder layers in the NMT model. Experimental results assessed through BLEU, METEOR, and TER metrics show that contextual embedding significantly improves the quality of Vietnamese–Korean NMT.

Highlights

Bidirectional encoder representations for Transformers, abbreviated as bidirectional encoder representations for transformers (BERT), is a pre-trained model introduced by Google [1]
Our experiment showed that using the Vietnamese BERT model in combination with neural MT (NMT) leads to a significant improvement in the quality of the Vietnamese–Korean translation system by 1.41 BLEU points and 2.54 TER points
We applied POS tagging, which is significantly improved by BERT, to Vietnamese sentences in the Vietnamese–Korean bilingual corpus

Summary

Introduction

Bidirectional encoder representations for Transformers, abbreviated as BERT, is a pre-trained model introduced by Google [1]. BERT has greatly improved the quality of natural language processing (NLP) tasks [1,2,3]. This state-of-the-art context-based embedding model comprises two tasks: masked-language modelling (MLM) and nextsentence prediction (NSP). MLM uses context words surrounding a masked word to predict what the masked word should be. When two sentences are fed into the BERT model, NSP predicts whether or not the second sentence can follow the first sentence. There are different variants of BERT for different languages: A Lite Bert (ALBERT) [4], Robustly Optimized BERT (RoBERTa) [5], and SpanBERT [6] are used for English; FlauBERT [7]

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving the Performance of Vietnamese–Korean Neural Machine Translation with Contextual Embedding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

What do Neural Machine Translation Models Learn about Morphology?
Yonatan Belinkov ... James Glass
-
Yonatan Belinkov, et. al.Yonatan Belinkov ... James Glass
01 Jan 2017
01 Jan 2017

Framework for Handling Rare Word Problems in Neural Machine Translation System Using Multi-Word Expressions
Kamal Deep Garg ... Rajeswari Chengoden
Applied Sciences | VOL. 12
Kamal Deep Garg, et. al.Kamal Deep Garg ... Rajeswari Chengoden
31 Oct 2022
Applied Sciences | VOL. 12

Comparing Statistical and Neural Machine Translation Performance on Hindi-To-Tamil and English-To-Tamil
Akshai Ramesh ... Rejwanul Haque
Digital | VOL. 1
Akshai Ramesh, et. al.Akshai Ramesh ... Rejwanul Haque
02 Apr 2021
Digital | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving the Performance of Vietnamese–Korean Neural Machine Translation with Contextual Embedding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences