Extracting Parallel Sentences from Low-Resource Language Pairs with Minimal Supervision

Xiayang Shi,Chun Xu,Pei Cheng,Zhenqiang Yu,Xinyi Liu

doi:10.1088/1742-6596/2171/1/012044

Extracting Parallel Sentences from Low-Resource Language Pairs with Minimal Supervision

Xiayang Shi, Chun Xu + Show 3 more

Open Access

https://doi.org/10.1088/1742-6596/2171/1/012044

Copy DOI

Journal: Journal of Physics: Conference Series	Publication Date: Jan 1, 2022
Citations: 1	License type: cc-by

Affiliation: Zhengzhou University of Light Industry, Xinjiang University of Finance and Economics

#Parallel Sentence #Bilingual Sentence Pairs + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

At present, machine translation in the market depends on parallel sentence corpus, and the number of parallel sentences will affect the performance of machine translation, especially in low resource corpus. In recent years, the use of non parallel corpora to learn cross language word representation as low resources and less supervision to obtain bilingual sentence pairs provides a new idea. In this paper, we propose a new method. First, we create cross domain mappings in a small number of single languages. Then a classifier is constructed to extract bilingual parallel sentence pairs. Finally, we prove the effectiveness of our method in Uygur Chinese low resource language by using machine translation, and achieve good results.

Full Text