Abstract

In the face of rapid globalization, the concept of translation performs the most important role in continuing the existence of native languages. Most of the research on Natural Language Processing in Neural Machine Translation has achieved an impressive result through parallel corpus dataset. Low resourced languages confront low performance due to the lack of parallel corpus data. Creating parallel corpus for language pair is more expensive and needs the persons who are expert knowledge for both languages. In this research, we present the availability of developing the translator for Sinhala-Tamil languages pair using monolingual corpus dataset. In this paper, the Byte Pair Encoding (BPE) is applied for overcoming the Out-Of-Vocabulary (OOV) problem in both Sinhala and Tamil languages. Our first part of the research is using monolingual word embedding approach for developing the translation in between Sinhala-Tamil language pair only using monolingual corpora. The second part of the research we use both parallel and monolingual corpus data with transformer architecture. The BLEU score and the synonyms analysis are used to evaluate the approach we suggested.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.