Punjabi to ISO 15919 and Roman Transliteration with Phonetic Rectification

Muhammad Raihan Abbas,Dr Khadim Hussain Asif

doi:10.1145/3359991

Muhammad Raihan Abbas, Dr Khadim Hussain Asif

https://doi.org/10.1145/3359991

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Transliteration removes the script barriers. Unfortunately, Punjabi is written in four different scripts, i.e., Gurmukhi, Shahmukhi, Devnagri, and Latin. The Latin script is understandable for nearly all factions of the Punjabi community. The objective of our work is to transliterate the Punjabi Gurmukhi script into Latin script. There has been considerable progress in Punjabi to Latin transliteration, but the accuracy of present-day systems is less than 50% (Google Translator has approximately 45% accuracy). We do not have the facility of a rich parallel corpus for Punjabi, so we cannot use the corpus-based techniques of machine learning that are in vogue these days. The existing systems of transliteration follow grapheme-based approach. The grapheme-based transliteration is unable to handle many scenarios such as tones, inherent schwa, glottal stops, nasalization, and gemination. In this article, the grapheme-based transliteration has been augmented with phonetic rectification where the Punjabi script is rectified phonetically before applying character-to-character mapping. Handling the inherent short vowel schwa was the major challenge in phonetic rectification. Instead of following the fixed syllabic pattern, we devised a generic finite state transducer to insert schwa. The accuracy of our transliteration system is approximately 96.82%.

Full Text