Abstract

Research on Sentiment Analysis in social media by using Mesopotamian-Iraqi Dialect (MID) of Arabic language was rarely found, there is no reliable dataset developed in MID neither an annotated corpus for the sentiment analysis of social media in this dialect. Therefore, this gap was the main stumbling block for researchers of sentiment analysis in MID, for this reason, this paper introduced the development of an annotated corpus of Mesopotamian-Iraqi Dialect for sentiment analysis in social media and named it as (ACMID) stands for (the annotated corpus of Mesopotamian-Iraqi Dialect) to help researchers in future for using this corpus for their studies, to the best of our knowledge this is the first annotated corpus that both classify polarity as well as emotion classification in MID. Likewise, Facebook as the most popular social platform among Iraqis was used to extract the data from its popular Iraqi pages. 5000 comments were extracted from these pages classified by its polarity (Positive, Negative, Neutral, Spam) by two Iraqi annotators, these annotators were simultaneously classifying the same comments according to Ekman seven universal emotions (Anger, Fear, Disgust, Happiness, Sadness, Surprise, Contempt) or no emotion. Cohen's kappa coefficient was then used to compare the two annotators’ results to find the reliability of these results. The data shows a comparable value among the two annotators for the polarity classification as high as 0.82, while for the emotion classification the result was 0.65.

Highlights

  • Mesopotamian-Iraqi Dialect (MID) is a main dialect of Arabic among more than 40 million people in Iraq and its neighbors

  • Iraq is an important country in the region of the Middle East and the whole world, it is the cradle of civilization and one of the wealthiest countries in the world in its oil reserves and production that might affect the world economy, Iraq was the main front in so many global events during human history, it’s hard to find someone in the world does not hear about Iraq because of the events that keep happening there

  • To make the new annotated corpus ACMID two Iraqi Arab native speakers will be involved tagging each comment that was extracted from Facebook pages and classifying them according to their polarity, the polarity classification will be either Positive, Negative or Neutral

Read more

Summary

INTRODUCTION

Mesopotamian-Iraqi Dialect (MID) is a main dialect of Arabic among more than 40 million people in Iraq and its neighbors. Facebook as mentioned before is the main platform of social media using by Iraqi people, it has more than 21 million users in Iraq [1], extracting data from Iraqi pages of Facebook can be so useful to get people's thoughts and opinions. Some Researchers preferred to do their researches on the English version on the original Arabic text instead, because of the complexity of Arabic language in general and the features that facilitates the extracting of the result in the English language to get a more accurate result [3] This gap was the main stumbling block for researchers of sentiment analysis in MID, for this reason, this paper will introduce a new annotated corpus named (ACMID) extracting its data from popular Iraqi Facebook pages to help researchers in the future using this corpus for their studies and researches on sentiment analysis in social media used MID. In this paper, related works will be stated a brief description for Arabic dialects will be shown in the third section, the fourth section will demonstrate the data collection and pre-processing, the fifth section will state the data annotation and the rules that have to be followed by the annotators, while the sixth section will discuss the results of this work

RELATED WORKS
DATA EXTRACTING AND PRE-PROCESSING
DATA ANNOTATION
RESULTS AND DISCUSSION
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.