Abstract

Recently, the vast use of social media and the high availability of internet access have produced a considerably different textual data from the formal and standard data on the Web. This includes various Arabic dialectal languages, which are the native spoken languages of Arabic speakers. The presence of textual Arabic dialectal languages on the Web has brought many new opportunities as well as challenges for machine learning and Arabic language processing. The identification of this type of informal data has its crucial effect on several applications such as sentiment analysis and machine translation. However, the standard NLP tools developed for traditional data fall short due to nature of dialectal textual data. Deep learning tools have proven to be very effective in processing social Media dialectal text. In this paper, we consider a variety of deep learning models for the automatic classification of Arabic dialectal text. We use a free large manually-annotated dataset known as Arabic Online Commentary (AOC), which includes several Dialectal Arabic (DA) along with the Modern Standard Arabic (MSA), [3]. We consider the most frequent dialects in the dataset. Namely, the Egyptian (EGP), Levantine (LEV), and Gulf –including Iraqi - (GLF). Four different deep neural network models have been implemented to examine the Arabic dialectal classification problem for each pair of the 3 dialects (binary classification experiments) as well as one ternary-classification experiment including all dialects together. The results show a varying but promising performance of the models for each pair of dialects. Furthermore, a closer examination on the manually-annotated AOC dataset has been carried out and hence, we conclude that there is a serious demand for a thorough refinement and review of the AOC annotated sentences as it is an important benchmark dataset in the field.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call