The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of whether it is positive, negative, or neutral. Recently, language models have shown great results in promoting the accuracy of text classification in English. The models are pre-trained on a large dataset and then fine-tuned on the downstream tasks. Particularly, XLNet has achieved state-of-the-art results for diverse natural language processing (NLP) tasks in English. In this paper, we hypothesize that such parallel success can be achieved in Arabic. The paper aims to support this hypothesis by producing the first XLNet-based language model in Arabic called AraXLNet, demonstrating its use in Arabic sentiment analysis in order to improve the prediction accuracy of such tasks. The results showed that the proposed model, AraXLNet, with Farasa segmenter achieved an accuracy results of 94.78%, 93.01%, and 85.77% in sentiment analysis task for Arabic using multiple benchmark datasets. This result outperformed AraBERT that obtained 84.65%, 92.13%, and 85.05% on the same datasets, respectively. The improved accuracy of the proposed model was evident using multiple benchmark datasets, thus offering promising advancement in the Arabic text classification tasks.
Read full abstract