Abstract
Arabic is a difficult language for natural language processing (NLP) because of its complicated morphology, dialectal differences and the limited annotated resources. Although deep learning algorithms have reached state-of-the-art results in many NLP tasks, comprehensive comparative studies for Arabic remains scarce. This paper addresses this gap by systematically evaluating three prominent deep learning architectures - namely Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs) and Transformers - across five essential Arabic NLP tasks: i) mention of sentiment analysis, ii) named entity recognition, iii) machine translation, iv) text classification and v) dialect identification. We differ the performance of models trained from scratch with fine-tuned versions of AraBERT, a powerful Transformer-based model pre-trained on a large Arabic corpus. Our experiments employ the Arabic datasets already existing in literature and utilizes accuracy, F1-score and BLEU as the evaluation metrics. The results are indicative of the supremacy of Transformer-based models with regard to AraBERT that shows the highest scores in each task. Notably, AraBERT attains 95. 2% accuracy on sentiment analysis, which is higher than the accuracies of RNNs and CNNs. These improvements also become apparent in other tasks, with AraBERT ending up as the best among RNN, CNN and others. A 3-point difference for 3 BLEU in machine translation and 2. 3% F1-score on dialect recognition. This extensive assessment, in turn, highlights the advantages and disadvantages of deep learning architectures for Arabic NLP. The excellent AraBERT representation also demonstrates how transfer learning and synergy between Transformer architectures and large-scale pre-training can significantly help Arabic language technology development.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.