Unsupervised SMT: an analysis of Indic languages and a low resource language

Shefali Saxena,Shweta Chauhan,Paras Arora,Philemon Daniel

doi:10.1080/0952813x.2022.2115142

Shefali Saxena, Shweta Chauhan + Show 2 more

https://doi.org/10.1080/0952813x.2022.2115142

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

ABSTRACT The rapid globalisation in language technology and the Internet’s fast expansion have brought nations and their cultures close together, and the demand for inter-language interactions has risen enormously. However, in many low-resource languages (LRL) pairings and areas, Machine Translation (MT) is still not viable because of a lack of parallel data. The challenge of MT is still unsolved. Recent studies employing monolingual datasets have shown excellent outcomes in Phrase-based Statistical MT (PBSMT) and Neural MT (NMT) systems. However, earlier researchers have demonstrated that unsupervised Statistical MT surpasses unsupervised NMT, especially for different language pairings. The study unveils the compendium of ten unsupervised SMT systems translation tasks utilizing a monolingual dataset from the Dravidian and Indo-Aryan language families; and a low-resource endangered language. The machine-translated experimental outcomes examined the system using different tokenizers and investigated them for various language pairs using different evaluation metrics for various iterations. The statistical significance of test results has been computed for each evaluation metric to check the true system quality of the translation tasks.

Full Text