Abstract

It is becoming increasingly difficult to know who is working on what and how in computational studies of Dialectal Arabic. This study comes to chart the field by conducting a systematic literature review that is intended to give insight into the most and least popular research areas, dialects, machine learning approaches, neural network input features, data types, datasets, system evaluation criteria, publication venues, and publication trends. It is a review that is guided by the norms of systematic reviews. It has taken account of all the research that adopted a computational approach to dialectal Arabic identification and detection and that was published between 2000 and 2020. It collected, analyzed, and collated this research, discovered its trends, and identified research gaps. It revealed, inter alia, that our research effort has not been directed evenly between speech and text or between the vernaculars; there is some bias favoring text over speech, regional varieties over individual vernaculars, and Egyptian over all other vernaculars. Furthermore, there is a clear preference for shallow machine learning approaches, for the use of n-grams, TF-IDF, and MFCC as neural network features, and for accuracy as a statistical measure of validation of results. This paper also pointed to some glaring gaps in the research: (1) total neglect of Mauritanian and Bahraini in the continuous Arabic language area and of such enclave varieties as Anatolian Arabic, Khuzistan Arabic, Khurasan Arabic, Uzbekistan Arabic, the Subsaharan Arabic of Nigeria and Chad, Djibouti Arabic, Cypriot Arabic and Maltese; (2) scarcity of city dialect resources; (3) rarity of linguistic investigations that would complement our research; (4) and paucity of deep machine learning experimentation.

Highlights

  • Arabic was adopted as an official language of the United Nations by the General Assembly in its 28th session on 18 December 1973

  • If we focus on the writing direction alone and observe how English writes left to right but Arabic adopts a right to left writing orientation, we will immediately realize that tools developed for the processing of English are not going to work for Arabic without much tweaking and possibly radical alteration if not total replacement

  • The key words used to retrieve articles for this review are: (1) ‘Arabic’ to exclude other languages that might be subject of investigation; (2) ‘dialect’ to include regional language variation and exclude variation due to age, gender, race, or profession; and (3) ‘detection’ or ‘identification’ to limit the search to computational studies that are focused on the discovery of dialects; linguistics is more focused on explanation of variation in terms of geography, age, gender, race, and profession than on spotting and classifying when an utterance belongs to a certain dialect

Read more

Summary

Introduction

Arabic was adopted as an official language of the United Nations by the General Assembly in its 28th session on 18 December 1973. Resolution 3190 [1] put into effect Arabic as an official and working language of the General Assembly and its Main Committees in recognition of the fact that it was the language of nineteen Members of the United Nations and a working language in specialized UN agencies. Arabic is the national language of more than 422 million people [2] and is ranked as the fifth most extensively used language in the world. MSA is exclusively used in news bulletins, publications, official speeches, film subtitles, and religious rites and ceremonies [3]. It is the intimate variety that speakers feel most comfortable with

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.