Abstract

In the face of rapid globalization, the concept of translation performs the most important role in continuing the existence of native languages. Most of the research on Natural Language Processing in Neural Machine Translation has achieved an impressive result through parallel corpus dataset. Low resourced languages confront low performance due to the lack of parallel corpus data. Creating parallel corpus for language pair is more expensive and needs the persons who are expert knowledge for both languages. In this research, we present the availability of developing the translator for Sinhala-Tamil languages pair using monolingual corpus dataset. In this paper, the Byte Pair Encoding (BPE) is applied for overcoming the Out-Of-Vocabulary (OOV) problem in both Sinhala and Tamil languages. Our first part of the research is using monolingual word embedding approach for developing the translation in between Sinhala-Tamil language pair only using monolingual corpora. The second part of the research we use both parallel and monolingual corpus data with transformer architecture. The BLEU score and the synonyms analysis are used to evaluate the approach we suggested.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call