Abstract

In this paper, we deal with the issue of sentiment analysis on dialectal comments extracted from social media. These comments concern the Algerian spoken language, written in Arabic and/or Latin characters, which could be either Modern Standard Arabic, French or local dialect. This complexity gives rise to a large number of text processing issues. The contributions of this work are fourfold. First, we build an Algerian dialect sentiment dataset of 11760 comments collecting from diverse social media platforms. Second, we also create Skip-Gram and CBOW model by word2vec from a corpus containing 466424 comments, these latter are used for enhancing the sentiment dataset by semantically similar words. Third, we propose an adapted preprocessing step set to deal with dialectal texts. Finally, we implement and conduct different machine learning classifiers (SVM, Naive Bayes via its three variants (Bernoulli NB, Gaussian NB and Multinomial NB)) and two deep learning architectures (CNN, RNN) to evaluate and compare the dataset in original version, in a transcribed to Latin character version and then in a semantically-enhanced version by word2vec models . Experiments reach performances of sentiment classifiers applied on "dataset transcribed to Latin characters" of accuracies = (MNB:84.21%, CNN:64.11%) and on "transcribed dataset and enhanced by word2vec models" of accuracies = (SVM:83.70%, RNN:65.21%).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.