Abstract

The vocabulary of Urdu language is a mixture of many other languages including Farsi, Arabic and Sinskrit. Though, Urdu is the national language of Pakistan, English has the status of official language of Pakistan. The use of English words in spoken Urdu as well as documents written in Urdu is increasing with the passage of time.The automatic detection of English words written using Urdu script in Urdu text is a complicated task. This may require the use of advanced machine/deep learning techniques. However, the lack of initial work for developing a fully automatic system makes it a more challenging task. The current paper presents the result of an initial work which may lead to the development of an approach which may detect any English word written Urdu text. First, an approach is developed to preserve Urdu stories from online sources in a normalized format. Second, a dictionary of English words transliterated into Urdu was developed. The results show that there can be different categories of words in Urdu text including transliterated words, words originating from English and words having exactly similar pronunciation but different meaning.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.