Abstract

Comprehension of “Singlish” (an alternative writing system for Sinhala language) texts by a machine had been a requirement for a long period. It has been a choice of many Sri Lankan’s writing style in casual conversations such as small talks, chats and social media comments. Finding a method to translate Singlish to Sinhala or English has been tried for a couple of years by the research community in Sri Lanka and many of the attempts were tried based on statistical language translation approaches due to the challenge of finding a large dataset to use Deep Learning approaches. This research addresses the challenge of preparing a data set to evaluate deep learning approach’s performance for the machine translation activity for Singlish to English language translation and to evaluate Seq2Seq Neural Machine Translation (NMT) model. The proposed seq2seq model is purely based on the attention mechanism, as it has been used to improve NMT by selectively focusing on parts of the source sentence during translation. The proposed approach can achieve 24.13 BLEU score on Singlish-English by seeing ~0.26 M parallel sentence pairs with 50 K+ word vocabulary.

Highlights

  • Comprehension of “Singlish” texts by a machine had been a requirement for a long period

  • Most significantly, when looking at a text written in Singlish, it can be observed that a mixture of English and Sinhala words are included in the text

  • In the Sri Lankan context, we have seen that people tend to write Sinhala in Latin Script (English Alphabet) most of the times, and when they communicate with natives, and they call it “Singlish”

Read more

Summary

INTRODUCTION

The Singlish is a way of writing the Sinhala pronunciation with English alphabet. The motivation for this research comes with the inability to interpret the texts written with alternative writing systems like Singlish in certain circumstances. Many social media platforms give you an option to translate the texts written in different languages to English if you do not understand the original written language. There is no option available to translate something written in an codemixed languages such as Singlish, Tanglish as those writing patterns are not recognized as standard languages. Especially in the countries in which this type of writing systems is popular, struggle to analyze social media data as there are no language models implemented

CHALLENGES
TRADITIONAL MACHINE TRANSLATIONS
NEURAL MACHINE TRANSLATION
Transformer
Seq2Seq model with Attention Mechanism
MetaMT
DESIGN AND IMPLEMENTATION
APPROACH
EVALUATION
VIII. CONCLUTION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.