Abstract

This paper discusses the problem of ambiguity in Jawi - Rumi machine transliteration for Jawi homograph words. Machine transliteration (MT) is the process of converting a script from source text to target text automatically. In the context of Malay MT for Jawi - Rumi, there are difficulties in obtaining high -accuracy transliteration of homographical Jawi words. Homographs are words that are the same spelling, but have different meanings and pronunciations. In the old Jawi spelling there were many homograph words, while it was successfully reduced when “Pedoman Ejaan Jawi yang Disempurnakan” (PEJYD) was first introduced by Dewan Bahasa dan Pustaka (DBP) in 1986. The main issue in the study of Malay Jawi - Rumi machine transliteration was word inaccuracy when the Jawi word is transliterated to Rumi. For example, the word “بيرو” can be transliterated to ‘biru’(blue) or ‘biro’(bureau), the word “بيليق” can be transliterated to ‘bilik’(room) or ‘belek’(turn around). This paper proposes that the Multinomial Naive Bayes (NBM) classification method be used for homograph unambiguity for TM Jawi - Rumi. Test results found that the accuracy of using this method can reach up to 67 percent.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.