RUTUT: Roman Urdu to Urdu Translator Based on Character Substitution Rules and Unicode Mapping

Mobeen Shahroz,Saleem Ullah,Gyu Sang Choi,Arif Mehmood,Muhammad Faheem Mushtaq

doi:10.1109/access.2020.3031393

Abstract

Urdu language written in English alphabets for communication is known as Roman Urdu. In pronunciation, both are the same but different in spelling and have different shapes of the alphabet. A survey acknowledges that 300 million people are speaking Urdu and about 11 million speakers in Pakistan from which maximum users prefer Roman Urdu for the textual communication. Today most of the modern technologies like computers and mobile phones using English script, due to this local Urdu user has to use English letters to type Urdu script that is Roman Urdu. In this research, Roman Urdu to Urdu Translator (RUTUT) is proposed that consists of preprocessing methods, rule-based character substitution and Unicode based character mapping techniques. It can transliterate the messages or descriptions from the Roman Urdu script to Urdu script which may help the Urdu speaker to elaborate their message in efficient manners. The focus of this research is to analyze the issues related to the Roman Urdu script to Urdu script transliteration and develop a translator based on the concepts of transliteration. This research analyzed Roman Urdu data and identified different rules-based character substitution techniques that transform the Roman Urdu into Urdu script at fundamental levels. This research is carried out using a python programming language in programming tool Anaconda in Jupiter notebook and user-friendly Graphical User Interface (GUI) created by using Tkinter library. To evaluate the RUTUT, different translational tests are performed and compare those results with famous Google online translator and ijunoon online transliteration. The analyses of results show that the proposed RUTUT approach translates accurately than Google online translator and ijunoon online transliteration.

Highlights

The multi-linguistic content rapidly growing on the internet in the last decade
In this research, Roman Urdu to Urdu script translational (RUTUT) model is developed as shown in Figure 19 consists of rule-based character substitution and Unicode based character mapping techniques
At the initial stage, when the user gives a Roman Urdu script as an input preprocessing rules are applied that filter unnecessary data

Summary

Introduction

The multi-linguistic content rapidly growing on the internet in the last decade. The information retrieval process based on cross-lingual [1] and monolingual gain a lot of attention from the Natural Language Processing (NLP) researcher community World Wide Web (WWW). It was the web of the English language and become a huge collection of. When the information retrieval process concentrated on the queries and accessed information in the same language is known as monolingual and cross-lingual focused to access information in several different languages [2]. The researchers of the NLP attract to those languages that have script writing styles from right to left like Urdu and Arabic.

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2020
Citations: 49	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

RUTUT: Roman Urdu to Urdu Translator Based on Character Substitution Rules and Unicode Mapping

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Pseudo Transfer Learning by Exploiting Monolingual Corpus: An Experiment on Roman Urdu Transliteration
Muhammad Yaseen Khan ... Tafseer Ahmed
-
Muhammad Yaseen Khan, et. al.Muhammad Yaseen Khan ... Tafseer Ahmed
01 Jan 2020
01 Jan 2020

Automatic Detection of Offensive Language for Urdu and Roman Urdu
Muhammad Pervez Akhter ... Irfan Raza Naqvi
IEEE access : practical innovations, open solutions | VOL. 8
Muhammad Pervez Akhter, et. al.Muhammad Pervez Akhter ... Irfan Raza Naqvi
01 Jan 2020
IEEE access : practical innovations, open solutions | VOL. 8

Urdu Sentiment Analysis
Iffraah Rehman ... Tariq Rahim Soomro
Applied Computer Systems | VOL. 27
Iffraah Rehman, et. al.Iffraah Rehman ... Tariq Rahim Soomro
01 Jun 2022
Applied Computer Systems | VOL. 27

Detecting Cyberbullying in Roman Urdu Language Using Natural Language Processing Techniques
Fahad Rasheed ... Mehmoon Anwar
Pakistan journal of engineering & technology | VOL. 5
Fahad Rasheed, et. al.Fahad Rasheed ... Mehmoon Anwar
19 Sep 2022
Pakistan journal of engineering & technology | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

RUTUT: Roman Urdu to Urdu Translator Based on Character Substitution Rules and Unicode Mapping

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions