Abstract

Sentiment analysis has recently become one of the growing areas of research related to text mining and natural language processing. Sentiment analysis techniques are increasingly exploited to categorize the opinion text to one or more predefined sentiment classes for the creation and automated maintenance of review-aggregation websites. Most of the current studies related to this topic focus mainly on English texts with very limited resources available for other languages like Arabic. The complexities of Arabic language in morphology, orthography and dialects makes sentiment analysis for Arabic more challenging. In this study, the Naive Bayes algorithm (NB) and Multilayer Perceptron (MLP) network are combined with hybrid system called NB-MLP for Arabic sentiment classification. Five datasets were tested; attraction, hotel, movie, product, and restaurant. The datasets are then classified into positive or negative polarities of sentiment using both standard and combined system. The 10-fold cross validation was employed for splitting the dataset. Over the whole set of experimental data, the results show that the combined system can achieve high classification accuracy and has promising potential application in the Arabic sentiment analysis and opinion mining.

Highlights

  • Sentiment analysis encompasses the vast field of effective classification of user generated text under defined polarities

  • Accuracy of sentiment analysis is increased by proposed system from dependence and independence assumptions among features

  • Over the whole set of experimental data, the performance of the proposed NBMLP ranked first compared to standard Naive Bayes (NB) with recorded testing accuracy of 99.6, 81.1, 98.2, 96.6 and 89.1 for attraction, hotel, movie, product and restaurant, respectively

Read more

Summary

Introduction

Sentiment analysis encompasses the vast field of effective classification of user generated text under defined polarities. These scores were integrated with different features such as unigrams, language independent features, Tweets-specific features and stem polarity features so as to create an input feature vector for the SVM classifier This combination of the Machine Learning classification approach and the lexicon based approach led to slightly better results than a oneapproach result (accuracy 84%). Despite the large size of the resulting resource, many of the entries are neither lemmatized nor diacritized, which limits the usability of their lexicon In their attempt to build Arabic multi–domain resources for Sentiment Analysis, ElSahar and ElBeltagy (2015) proposed a semi-supervised approach to generate multi-domain lexica out of four multi–domains reviews datasets. This method makes use of the feature selection capabilities of SVM to select the most efficient unigram and bigram features.

Naive Bayes Classifier
The Proposed NB-MLP
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.