Abstract

ABSTRACT The rapid globalisation in language technology and the Internet’s fast expansion have brought nations and their cultures close together, and the demand for inter-language interactions has risen enormously. However, in many low-resource languages (LRL) pairings and areas, Machine Translation (MT) is still not viable because of a lack of parallel data. The challenge of MT is still unsolved. Recent studies employing monolingual datasets have shown excellent outcomes in Phrase-based Statistical MT (PBSMT) and Neural MT (NMT) systems. However, earlier researchers have demonstrated that unsupervised Statistical MT surpasses unsupervised NMT, especially for different language pairings. The study unveils the compendium of ten unsupervised SMT systems translation tasks utilizing a monolingual dataset from the Dravidian and Indo-Aryan language families; and a low-resource endangered language. The machine-translated experimental outcomes examined the system using different tokenizers and investigated them for various language pairs using different evaluation metrics for various iterations. The statistical significance of test results has been computed for each evaluation metric to check the true system quality of the translation tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call