Discovering Semantic and Sentiment Correlations using Short Informal Arabic Language Text

Salha Alosaimi,Khan Muhammed

doi:10.14569/ijacsa.2017.080126

Abstract

Semantic and Sentiment analysis have received a great deal of attention over the last few years due to the important role they play in many different fields, including marketing, education, and politics. Social media has given tremendous opportunities for researchers to collect huge amount of data as input for their semantic and sentiment analysis. Using twitter API, we collected around 4.5 million Arabic tweets and used them to propose a novel automatic unsupervised approach to capture patterns of words and sentences of similar contextual semantics and sentiment in informal Arabic language at word and sentence levels. We used Language Modeling (LM) model which is statistical model that can estimate the distribution of natural language in effective way. The results of experiments of proposed model showed better performance than classic bigram and latent semantic analysis (LSA) model in most of cases at word level. In order to handle the big data, we used different text processing techniques followed by removal of the unique words based on their rele Informal Arabic, Big Data, Sentiment analysis, Opinion Mining (OM), semantic analysis, bigram model, LSA model, Twitter vance to problem.

Highlights

The last decade has seen a huge increase in the number of internet users in Middle East
This motivated us to focus on the problems that exist in the realm of informal Arabic semantic and sentiment analysis encouraging the researchers to participate more in this field
The proposed method was compared with the bigram [X] and latent semantic analysis (LSA) models

Summary

Introduction

The last decade has seen a huge increase in the number of internet users in Middle East. This growth has helped in enriching the amount of Arabic content on website. There are wide numbers of users that use the social networks. Since most of users use informal Arabic in the world of social media, the task of semantic and sentiment analysis becomes more sophisticated. One of the main challenges is the limited number of researches that focus on the informal Arabic sentiments analysis. This motivated us to focus on the problems that exist in the realm of informal Arabic semantic and sentiment analysis encouraging the researchers to participate more in this field. A tweet is small piece of data but to annotate them when they are millions followed by application of machine learning techniques and analyzing classification models to understand the polarity of different words is pretty difficult and expensive job

Methods

Results

Conclusion