Abstract
Kuwaiti Arabic (KA), like other Arabic dialects, is a spoken variety of Arabic that does not have a standardized written convention contrary to Modern Standard Arabic (MSA). With the emergence and spread of social media platforms, Arabic dialects have found their way into the written medium, and hence a need arose to process them alongside MSA. The biggest challenge facing NLP tools is that dialects do not have consistent written conventions contrary to MSA, and writers expressing their dialects usually follow a phonetic writing system, or they write words as they pronounce them. This has opened the door for variations within the same dialect and between dialects and MSA. Furthermore, a prerequisite for analysing any language or dialect is the presence of clear written conventions. Therefore, efforts have been made to establish written conventions for Arabic dialects, but the Kuwaiti dialect has not received the required attention. The current study offers a practical solution for processing written KA. It identified and extracted the written conventions of KA from natural data collected from over 100K Kuwaiti tweets since they represent a good model of natural language. The morphological analyzer (MADAMIRA) - which is designed to process MSA - was enhanced with the extracted criteria. Furthermore, the study involved enriching the analyzer with a dictionary of Kuwaiti terms and vocabulary ‘lemmas’ collected from the Encyclopaedia of Kuwaiti Arabic and from the most used Kuwaiti words on Twitter (currently X). Providing the analyzer with this dictionary of KA words helps it process KA more accurately. The expanded version of the analyzer (MADAMIRA-KA) is the first of its kind designed entirely to process the Kuwaiti dialect and has achieved excellent performance in analyzing over 100K Kuwaiti tweets successfully. The importance of this study lies in developing such a morphological analyzer, which can be used for automated translation, dialect recognition and sentiment analysis.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.