Abstract

Sentiment analysis (SA) of Arabic tweets is a complex task due to the rich morphology of the Arabic language and the informal nature of language on Twitter. Previous research on the SA of tweets mainly focused on manually extracting features from the text. Recently, neural word embeddings have been utilized as less labor-intensive representations than manual feature engineering. Most of these word-embeddings model the syntactic information of words while ignoring the sentiment context. In this paper, we propose to learn sentiment-specific word embeddings from Arabic tweets and use them in the Arabic Twitter sentiment classification. Moreover, we propose a feature ensemble model of surface and deep features. The surface features are manually extracted features, and the deep features are generic word embeddings and sentiment-specific word embeddings. The extensive experiments are performed to test the effectiveness of the surface and deep features ensemble, pooling functions, embeddings size, and cross-dataset models. The recent language representation model BERT is also evaluated on the task of SA of Arabic tweets. The models are evaluated on three different datasets of Arabic tweets, and they outperform the previous results on all these datasets with a significant increase in the F-score. The experimental results demonstrate that: 1) the highest performing model is the ensemble of surface and deep features and 2) the approach achieves the state-of-the-art results on several benchmarking datasets.

Highlights

  • The abundance of user-generated content in the form of social media websites has produced massive amounts of unstructured text on the web

  • The Fscore (F1), Precision (P), and Recall (R) of the positive and negative classes are as follows: F1 = 2 × (P × R)/(P + R) P = tp/(tp + fp) R = tp/(tp + fn) where tp is the number of positive tweets classified correctly as positive, fp is the number of negative tweets falsely classified as positive, fn is the number of positive tweets falsely classified as negative, and tn is the number of negative tweets correctly classified as negative

  • In this paper, we proposed an ensemble of surface and deep features for sentiment classification of Arabic tweets

Read more

Summary

Introduction

The abundance of user-generated content in the form of social media websites has produced massive amounts of unstructured text on the web. This text contains sentiment and opinions that are valuable for both individuals and organizations. Given a unit of text, the task of sentiment analysis is to classify the text as positive, negative, or neutral. Sentiment analysis of Arabic tweets is a complex task due to the rich morphology of the Arabic language and the informal nature of language on Twitter. Approaches to sentiment analysis include supervised learning techniques that exploit machine learning algorithms with feature engineering and

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call