Abstract

Transliteration removes the script barriers. Unfortunately, Punjabi is written in four different scripts, i.e., Gurmukhi, Shahmukhi, Devnagri, and Latin. The Latin script is understandable for nearly all factions of the Punjabi community. The objective of our work is to transliterate the Punjabi Gurmukhi script into Latin script. There has been considerable progress in Punjabi to Latin transliteration, but the accuracy of present-day systems is less than 50% (Google Translator has approximately 45% accuracy). We do not have the facility of a rich parallel corpus for Punjabi, so we cannot use the corpus-based techniques of machine learning that are in vogue these days. The existing systems of transliteration follow grapheme-based approach. The grapheme-based transliteration is unable to handle many scenarios such as tones, inherent schwa, glottal stops, nasalization, and gemination. In this article, the grapheme-based transliteration has been augmented with phonetic rectification where the Punjabi script is rectified phonetically before applying character-to-character mapping. Handling the inherent short vowel schwa was the major challenge in phonetic rectification. Instead of following the fixed syllabic pattern, we devised a generic finite state transducer to insert schwa. The accuracy of our transliteration system is approximately 96.82%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call