Abstract

Twitter considered as a rich resource to collect people's opinions in different domains and attracted researchers to develop an automatic Sentiment Analysis (SA) model for tweets. In this work, a semantic Arabic Twitter Sentiment Analysis (ATSA) model is developed based on supervised machine learning (ML) approaches and semantic analysis. Most of the existing Arabic SA approaches represent tweets based on the bag-of-words (BoW) model. The main limitation of this model is that it is semantically weak; where words considered as independent features and ignore the semantic associations between them. As a result, synonymous words that appear in two tweets are represented as different independent features. To overcome this limitation, this work proposes enriching the tweets representation with concepts utilizing Arabic WordNet (AWN) as an external knowledge base. In addition, different concepts representation approaches are developed and evaluated with naive Bayes (NB) and support vector machine (SVM) ML classifiers on an Arabic Twitter dataset. The experimental results indicate that using concepts features improves the performance of the ATSA model compared with the basic BoW representation. The improvement reached 4.48% with the SVM classifier and 5.78% with the NB classifier.

Highlights

  • Twitter is considered to be one of the most popular microblogs

  • Two incorporation strategies were used, Add concepts (AddC) and BoC, with two word sense disambiguation (WSD) methods, manual and automatic. They used the support vector machine (SVM) classifier and found that using the AddC strategy with the manual WSD method achieved the best performance with an accuracy of 90.20%, which increased the performance of the sentiment analysis (SA) by 5.3 % over the baseline BoW

  • Different tweet representation approaches were experimented to determine the best approach that improved the performance of Arabic Twitter Sentiment Analysis (ATSA) model

Read more

Summary

INTRODUCTION

Twitter is considered to be one of the most popular microblogs. It has allowed people to communicate, share comments, and express their opinions on almost all aspects of daily life at an increasing rate. The vector space model (VSM) [2], called the bag-of-words (BoW) model, is considered as a fundamental text representation model used in most ML approaches because of its simplicity and effectiveness. Several studies, most of which were in the English language, have been proposed using a new semantic concepts representation model in various text mining (TM) fields including clustering [3], topic classification [4,5,6], and SA [7, 8]. Unlike existing Arabic SA models which represent tweets texts in their lexical space based on BoW features, semantic concepts representation approach was proposed which aims to represent tweets in their semantic space by taking into account the semantic relationships between the words by utilizing the Arabic WordNet (AWN). The last section concludes the paper and gives directions for future work

RELATED WORKS
ARABIC TWITTER SENTIMENT ANALYSIS MODEL
Text Preprocessing
Features Extraction
CONCEPTS REPRESENTATION APPROACH
Concepts Identification
Concepts Incorporation
EXPERIMENTS AND EVALUATION
Arabic Twitter Dataset
Evaluation Method and Performance Measurements
Findings
CONCLUSION AND FUTURE WORKS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.