Clustering Arabic Tweets for Sentiment Analysis

Diab Abuaiadah,Mustafa Jarrar,Dileep Rajendran

doi:10.1109/aiccsa.2017.162

Abstract

The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Clustering Arabic Tweets for Sentiment Analysis

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Using Bisect K-Means Clustering Technique in the Analysis of Arabic Documents
Diab Abuaiadah
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 15
Diab AbuaiadahDiab Abuaiadah
28 Jan 2016
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 15

Choice of distance function in the segmentation of regions of interest in microscopic images of breast tissues
Grzegorz Wieczorek ... Leszek J Chmielewski
-
Grzegorz Wieczorek, et. al.Grzegorz Wieczorek ... Leszek J Chmielewski
01 Sep 2017
01 Sep 2017

Does linguistic marking have a psychological correlate?
Helen W Hamilton ... James Deese
Journal of Verbal Learning and Verbal Behavior | VOL. 10
Helen W Hamilton, et. al.Helen W Hamilton ... James Deese
01 Dec 1971
Journal of Verbal Learning and Verbal Behavior | VOL. 10

Increasing the Visibility of Search using Genetic Algorithm
...
-
, et. al. ...
03 Sep 2015
03 Sep 2015

Publication Date: Oct 1, 2017
Citations: 28	License type: mit

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clustering Arabic Tweets for Sentiment Analysis

Abstract

Talk to us

Similar Papers