Unsupervised Parallel Sentences of Machine Translation for Asian Language Pairs

Shaolin Zhu,Chun Xu,Yong Yang,Chenggang Mi,Tianqi Li

doi:10.1145/3486677

Abstract

Parallel sentence pairs play a very important role in many natural language processing tasks, especially cross-lingual tasks such as machine translation. So far, many Asian language pairs lack bilingual parallel sentences. As collecting bilingual parallel data is very time-consuming and difficult, it is very important for many low-resource Asian language pairs. While existing methods have shown encouraging results, they rely on bilingual data seriously or have some drawbacks in an unsupervised situation. To address these issues, we propose a new unsupervised similarity calculation and dynamic selection metric to obtain parallel sentence pairs in an unsupervised situation. First, our method maps bilingual word embedding by postdoc adversarial training, which rotates the source space to match the target without parallel data. Then, we introduce a new cross-domain similarity adaption to obtain parallel sentence pairs. Experimental results on real-world datasets show that our model can obtain better accuracy and recall on mining parallel sentence pairs. We also show that the extracted bilingual sentence corpora can significantly improve the performance of neural machine translation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Unsupervised Parallel Sentences of Machine Translation for Asian Language Pairs

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Mar 10, 2023
Citations: 2

Similar Papers

IELTS translation education corpus construction based on bilingual non-parallel data model
Qiaoling Zhou
International Journal of Knowledge-based and Intelligent Engineering Systems | VOL. 25
Qiaoling ZhouQiaoling Zhou
18 Feb 2022
International Journal of Knowledge-based and Intelligent Engineering Systems | VOL. 25

Japanese translation teaching corpus based on bilingual non parallel data model
Zheng Guo ... Zhu Jifeng
Journal of Intelligent & Fuzzy Systems | VOL. 40
Zheng Guo, et. al.Zheng Guo ... Zhu Jifeng
01 Jan 2020
Journal of Intelligent & Fuzzy Systems | VOL. 40

Extracting Parallel Sentences from Low-Resource Language Pairs with Minimal Supervision
Xiayang Shi ... Xinyi Liu
Journal of Physics: Conference Series | VOL. 2171
Xiayang Shi, et. al.Xiayang Shi ... Xinyi Liu
01 Jan 2021
Journal of Physics: Conference Series | VOL. 2171

Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision.
Xiayang Shi ... Xinyi Liu
Computational intelligence and neuroscience | VOL. 2022
Xiayang Shi, et. al.Xiayang Shi ... Xinyi Liu
03 Aug 2022
Computational intelligence and neuroscience | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised Parallel Sentences of Machine Translation for Asian Language Pairs

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing