Abstract

Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.

Highlights

  • In this era of globalization, people are unprecedentedly exposed to information from global sources

  • A total of 3537 unique words is chosen from the sentence database for the experiment

  • It can be seen that Three-stage Hybrid Transliteration (THT) successfully produced desired transliterations for over 98% of the input words

Read more

Summary

INTRODUCTION

In this era of globalization, people are unprecedentedly exposed to information from global sources. Transliteration refers to the phonetic conversion of words across different pairs of languages [2]– [4] It is a challenging task since pronunciation rules vary across languages and there are times when exact/similar sounding phonemes are not available in the target language. An example of this would be to transliterate the proper noun Parvez/Parves (a Persian name) in Arabic which has no letter in the alphabet that sounds similar to both ‘P’ and ‘V’ sounds Another challenge is to transliterate phonetic typed text where native language words are written using primarily English alphabet. We propose a novel robust Three-stage Hybrid Transliteration (THT) framework that converts English words into Bangla. It can transliterate phonetic typed words satisfactorily.

RELATED WORKS
SCOPE AND CHALLENGES
English Words (EEn)
Phonetic Typed Bangla Words (EBn )
Pronunciation Variation in EEn
Irregularity of EBn
Coincidence of EEn and EBn
PROPOSED THT FRAMEWORK
Phonetic transliteration
Bangla Alphabet and Spelling Rules There are 11 vowels and 39 consonants in
Candidate generation
Scoring Mechanism
Spelling Improvement
Preprocessing
The HRDP Algorithm
Candidate Ranking
Flexibility of the Framework
Hyperparameters
Experimental design
THT as a component of MT
THT as a standalone transliteration tool
Potential for Dataset Generation
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.