Active learning for Arabic sentiment analysis

Abdelrahman Kaseb,Mona Farouk

doi:10.1016/j.aej.2023.06.082

Abstract

Sentiment analysis becomes an essential part of every social network, as it enables decision-makers to know more about users’ opinions. Arabic sentiment analysis has a small number of datasets relative to English. This paper introduces active learning to the Arabic NLP community and demonstrates its power. Active learning is the process of selecting the data before labelling it to get the best benefits from the labelled data and improve performance. As it takes much time and effort to label large datasets, a better way is to label selected data and get higher performance. This paper applies active learning to the ArSarcasm-v2 dataset, which is labelled for sentiment, sarcasm, and dialect, by using the SAIDS “Sentiment Analysis Informed of Dialect and Sarcasm” model that predicts sarcasm and dialect and then uses them to predict the sentiment of the text. The paper runs multiple active learning experiments with different setups. The paper achieves state-of-the-art performance on ArSarcasm-v2 for sentiment analysis by achieving a 76.71 F1-score using 95% of the training dataset which means that adding some data points makes the performance worse. The paper also demonstrates that by using active learning, we achieve 97%, 99%, and 100% of the highest F1-score reached by all training data using only 10%, 20%, and 27% of the training data respectively. This means that using about one-quarter of the training data attains the same F1-score as using the full training data. As per our knowledge, this is the first paper to introduce active learning to the Arabic NLP community.

Full Text