Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision.

Xiayang Shi,Lin Xu,Ping Yue,Chun Xu,Xinyi Liu

doi:10.1155/2022/5296946

Abstract

Machine translation relies on parallel sentences, the number of which is an important factor affecting the performance of machine translation systems, especially in low-resource languages. Recent advances in learning cross-lingual word representations from nonparallel data by machine learning make a new possibility for obtaining bilingual sentences with minimal supervision in low-resource languages. In this paper, we introduce a novel methodology to obtain parallel sentences via only a small-size bilingual seed lexicon about hundreds of entries. We first obtain bilingual semantic by establishing cross-lingual mapping in monolingual languages via a seed lexicon. Then, we construct a deep learning classifier to extract bilingual parallel sentences. We demonstrate the effectiveness of our methodology by harvesting Uyghur-Chinese parallel sentences and constructing a machine translation system. The experiments indicate that our method can obtain large and high-accuracy bilingual parallel sentences in low-resource language pairs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision.

Abstract

Talk to us

Similar Papers

More From: Computational intelligence and neuroscience

Lead the way for us

Journal: Computational intelligence and neuroscience	Publication Date: Aug 3, 2022
License type: cc-by

Similar Papers

Extracting Parallel Sentences from Low-Resource Language Pairs with Minimal Supervision
Xiayang Shi ... Xinyi Liu
Journal of Physics: Conference Series | VOL. 2171
Xiayang Shi, et. al.Xiayang Shi ... Xinyi Liu
01 Jan 2021
Journal of Physics: Conference Series | VOL. 2171

Unsupervised Parallel Sentences of Machine Translation for Asian Language Pairs
Shaolin Zhu ... Tianqi Li
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22
Shaolin Zhu, et. al.Shaolin Zhu ... Tianqi Li
10 Mar 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22

An Explainable Evaluation of Unsupervised Transfer Learning for Parallel Sentences Mining
Shaolin Zhu ... Chenggang Mi
-
Shaolin Zhu, et. al.Shaolin Zhu ... Chenggang Mi
01 Jan 2020
01 Jan 2020

A Diagnostic Evaluation Approach for English to Hindi MT Using Linguistic Checkpoints and Error Rates
Renu Balyan ... Antonio Toral
-
Renu Balyan, et. al.Renu Balyan ... Antonio Toral
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision.

Abstract

Talk to us

Similar Papers

More From: Computational intelligence and neuroscience