Abstract

Abstract. Sentiment Analysis concerns the analysis of ideas, emotions, evaluations, values, attitudes and feelings about products, services, companies, individuals, tasks, events, titles and their characteristics. With the increase in applications on the Internet and social networks, Sentiment Analysis has become more crucial in the field of text mining research and has since been used to explore users’ opinions on various products or topics discussed on the Internet. Developments in the fields of Natural Language Processing and Computational Linguistics have contributed positively to Sentiment Analysis studies, especially for sentiments written in non-structured or semi-structured languages. In this paper, we present a literature review on the pre-processing task on the field of sentiment analysis and an analytical and comparative study of different researches conducted in Arabic social networks. This study allowed as concluding that several works have dealt with the generation of stop words dictionary. In this context, two approaches are adopted: first, the manual one, which gives rise to a limited list, and second, the automatic, where the list of stop words is extracted from social networks based on defined rules. For stemming two, algorithms have been proposed to isolate prefixes and suffixes from words in dialects. However, few works have been interested in dialects directly without translation. The Moroccan dialect in particular is considered as the 5th dialect studied among Arabic dialects after Jordanian, Egyptian, Tunisian and Algerian dialects. Despite the significant lack in studies carried out on Arabic dialects, we were able to extract several conclusions about the difficulties and challenges encountered through this comparative study, as well as the possible ways and tracks to study in any dialects sentiment analysis pre-processing solution.

Highlights

  • Nowadays, social networking has become in some ways one of the most popular communication tools

  • Sentiment analysis plays an essential role in decision-making in different fields such as politics, digital marketing, and for studying social phenomena

  • There is a remarkable lack of pre-processing on unstructured languages such as the Arabic dialect "Darija as an example" even though these dialects represent a rich source of information given that they are the most used by the population on non-professional social networks

Read more

Summary

INTRODUCTION

Social networking has become in some ways one of the most popular communication tools. Allow peoples to convey opinions, share experiences or talk about everything about them online (Tan et al, 2011).The monitoring of social media has become an important way to analyse and detect trends, by studying and evaluating opinions on various topics such as politics (Eason et al, 1995), services (teachings, health...), marketing and business products People can share their opinions in an environment without constraint and, companies can extract useful ideas for their decision-making process. User-generated content on the Web is generally unstructured and needs important pre-processing steps and analysis to extract useful knowledge (Melville et al, 2019) These steps depend on the nature of the language (structured or unstructured) and generally are different from one research to another.

Pre-processing task Major steps
Arabic dialect pre-processing challenges
RELATED WORK
Criteria
Comparison
STATISTICAL ANALYSIS AND DISCUSSION
Findings
CONCLUSION AND FUTURE WORKS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.