Arabic Dialect Identification with Deep Learning and Hybrid Frequency Based Features

Youssef Fares,Marwan Torki,Muhammed Ezzeldin,Kareem Abdel-Salam,Zeyad El-Zanaty,Karim El-Awaad,Aliaa Mohamed

doi:10.18653/v1/w19-4626

Abstract

Studies on Dialectical Arabic are growing more important by the day as it becomes the primary written and spoken form of Arabic online in informal settings. Among the important problems that should be explored is that of dialect identification. This paper reports different techniques that can be applied towards such goal and reports their performance on the Multi Arabic Dialect Applications and Resources (MADAR) Arabic Dialect Corpora. Our results show that improving on traditional systems using frequency based features and non deep learning classifiers is a challenging task. We propose different models based on different word and document representations. Our top model is able to achieve an F1 macro averaged score of 65.66 on MADAR’s small-scale parallel corpus of 25 dialects and Modern Standard Arabic (MSA).

Full Text