Abstract

Arabic language has very rich vocabulary. It is manifested in different forms. The formal, Modern Standard Arabic (MSA), and the informal, colloquial or dialects. Dialectical languages become important as a result of the proliferation of social networks which resulted in the vast unstructured dialectical text available on the web. Unique properties of modern standard Arabic and dialects present major challenges to build sentiment analysis systems by adopting similar models designed for the English language. In this paper, we present a supervised Arabic sentiment analysis using a bag-of-words feature. We further examine using a set of key words (lexicon) for better polarity classification. The testing of the system is carried out on the freely-available Arabic books' reviews (LABR) dataset. LABR includes both modern standard Arabic and Egyptian dialectal reviews. We used both balanced and unbalanced datasets. Clearly, the balanced data set is small in size and, henceforth, a large-scale balanced dataset is required for training of the classifier model. Further, we compared the computed predicted sentiments against the actual reviews for a specific book. Findings, by annotators, had indicated ambiguity between a review and its rating when verified alongside the predicted sentiment, which provided a more reasonable result. Moreover, working with dialects and sarcasm is exceedingly exciting. Experimental results on the adopted logistic classifier model and LABR are encouraging and promising. However, a key prerequisite is the availability of rich and well represented datasets in order to develop robust and efficient Arabic sentiment analyzers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call