Abstract

The main objective of this research study is to design a model that allows for the utilization of a novel technique for the implementation of sentiment analysis in the Arabic language. Sentiment analysis is an interesting task that includes web mining, Natural Language Processing (NLP) and Machine Learning (ML). Most of the research work on sentiment analysis was focused on the texts in the English language. Therefore, the research on sentiment analysis in the Arabic language and other languages are in the infancy stage. This study empirically evaluates three Feature Selection Methods (FSM) (Information Gain (IG), Chi-square (CHI) and Gini Index (GI)) and, three classification approaches (Association Rule (AR) mining and the N-gram model and the Meta-classifier approach) for the implementation of sentiment classification in the Arabic language. A number of related experiments have been carried out on the Opinion Corpus of Arabic (OCA). The results obtained from the experiments were favorable, depending on the algorithms used and the number of selected feature has proven that the use of FS method can increase the performance of sentiment classification in the Arabic language. The results of the experiments reveal that FS method is obtained to develop the classifier performance. Furthermore, the results of the experiment indicated that the use of CHI feature selection has produced the best performance for FS and the performance of meta-classifier a combination approach has outperformed the other approaches for sentiment classification in the Arabic language. In conclusion, this research study has proven that the combination approach (meta-classifier) with the chi-square FS method produces the most accurate classification technique, as high as 90.80%.

Highlights

  • Today, the use of online social media has grown extensively in numerous sectors ranging from a social chat between family members and friends, to doing banking transactions, purchasing fashion wear and to the expression of viewpoints

  • N-gram model and Association using a meta-classifier, the outputs for all the class Rule (AR) were initially implemented to the whole labels of component classifier are viewed as new document-term feature space to examine the overall performance of the classifier through the accuracy performance of the sentiment analysis in the Arabic language without used feature selection/reduction methods

  • The purpose is mainly to highlight the best results achieved when the (AR) classifier was implemented with Information Gain, Chisquare and Gini index feature selection methods through the use of different features of varying sizes (100 to 500)

Read more

Summary

Introduction

The use of online social media has grown extensively in numerous sectors ranging from a social chat between family members and friends, to doing banking transactions, purchasing fashion wear and to the expression of viewpoints. These online comments or opinions cover a variety of topics in books, movies, electronic products, cars, politics and eateries. This activity has raised the interest of different parties such as customers, companies and government in the analysis and investigation of these opinions. Data mining and natural language processing in the aspect of correct extraction of people’s sentiments from a large quantity of reviews in unstructured text

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call