Abstract

Gujarati is the language used for everyday communication in the state of Gujarat, India. The Gujarati language is also officially recognized by the constitution and the government of India. Gujarati script is based on the Devanagari script. An idiom is an expression, phrase, or word that has a different meaning from the literal meaning of the words in it. Idioms represent the cultural heritage of Gujarati language. Idioms are used in Gujarati language for effective communication and convey of an accurate message. No Machine Translation System does the accurate translation of Gujarati idioms to English or any other language. Different idiom phrases can be generated by adding diacritic(s) as well as suffix to the root or base form of the idiom. Many forms of single idiom make automatic idiom identification as well as machine translation more challenging. This paper focuses on the design and implementation of diacritics and suffix-based rules for dynamic phrase generation and detection of idioms of Gujarati language. This implementation helps in identifying Gujarati idiom present in any possible form in the Gujarati text. The obtained results with the execution of 7050 different Gujarati idiom phrases yield an accuracy of 99.73%. The results are encouraging enough to make the proposed implementation useful for Natural Language processing tasks related to Gujarati language idioms.

Highlights

  • Machine translation is the sub-field of Natural Language Processing (NLP) which is a sub-field of Artificial intelligence (AI)

  • In Gujarati language, one idiom can be used in many ways i.e. one specific idiom may have many forms or phrases

  • By the exhaustive in-depth study of 3240 Gujarati idioms and their 7050 different idiom forms, 15 rules are generated. These rules are used to insert diacritic(s) and suffixes to the base or root form of Gujarati idiom. These dynamically generated different idiom forms are used to identify any idiom phrase inside the text

Read more

Summary

Introduction

Machine translation is the sub-field of Natural Language Processing (NLP) which is a sub-field of Artificial intelligence (AI). Natural language processing is the study of any language by analyzing its structure and morphology. Natural language processing is challenging as different language has different grammatical structure. Vocabulary is important for the enrichment of the language. Idioms contribute to the enrichment of the language. The idiom is an incomplete phrase as part of a sentence

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call