Sentiment lexicon for sentiment analysis of Saudi dialect tweets

Abdulmohsen Al-Thubaity,Qubayl Alqahtani,Abdulaziz Aljandal

doi:10.1016/j.procs.2018.10.494

Abdulmohsen Al-Thubaity, Qubayl Alqahtani + Show 1 more

Open Access

https://doi.org/10.1016/j.procs.2018.10.494

Copy DOI

Abstract

Twitter is one of the most widely used social media platforms in Saudi Arabia and is a rich source for mining the public’s attitude towards political, social, and economic matters. Sentiment analysis is a technique used for identifying the polarity (positive, negative, or neutral) of a given tweet, using either machine learning approaches or sentiment lexicons. This paper presents two resources. The first is the Saudi dialect sentiment lexicon (SauDiSenti), which is a sentiment lexicon for sentiment analysis of Saudi dialect tweets. SauDiSenti comprises 4431 words and phrases from modern standard Arabic (MSA) and Saudi dialects manually extracted from a previously labelled dataset of tweets obtained from trending hashtags in Saudi Arabia. The second is a testing dataset comprising 1500 tweets evenly distributed over three classes: positive, negative, and neutral. To evaluate the performance of SauDiSenti, we used precision, recall, and F measure and compared it to AraSenTi—a larger Arabic sentiment dictionary. The data suggest that AraSenTi outperforms SauDiSenti only when both positive and negative tweets are considered, whereas SauDiSenti outperforms AraSenTi when positive, negative, and neutral tweets are considered. Despite the small size of SauDiSenti, its use for sentiment analysis of Saudi dialect tweets shows promising results in comparison to the automatically constructed larger dictionary AraSenTi. SauDiSenti and the testing dataset are available for download at http://corpus.kacst.edu.sa/more_info.jsp.

Full Text