Abstract

Research efforts in the field of sentiment analysis have exponentially increased in the last few years due to its applicability in areas such as online product purchasing, marketing, and reputation management. Social media and online shopping sites have become a rich source of user-generated data. Manufacturing, sales, and marketing organizations are progressively turning their eyes to this source to get worldwide feedback on their activities and products. Millions of sentences in Urdu and Roman Urdu are posted daily on social sites, such as Facebook, Instagram, Snapchat, and Twitter. Disregarding people’s opinions in Urdu and Roman Urdu and considering only resource-rich English language leads to the vital loss of this vast amount of data. Our research focused on collecting research papers related to Urdu and Roman Urdu language and analyzing them in terms of preprocessing, feature extraction, and classification techniques. This paper contains a comprehensive study of research conducted on Roman Urdu and Urdu text for a product review. This study is divided into categories, such as collection of relevant corpora, data preprocessing, feature extraction, classification platforms and approaches, limitations, and future work. The comparison was made based on evaluating different research factors, such as corpus, lexicon, and opinions. Each reviewed paper was evaluated according to some provided benchmarks and categorized accordingly. Based on results obtained and the comparisons made, we suggested some helpful steps in a future study.

Highlights

  • The Indo-Pak subcontinent is one of the most significant markets for all types of products

  • We provide a detailed study related to investigating Roman Urdu and Urdu sentiments, collecting data sets, preprocessing techniques, classifying methods, and comparing results of various researchers

  • We present an overview of a few research papers out of the selected papers that present their work in Roman Urdu and Urdu. [3] in their research presented a stateof-the-art review on multilingual opinion mining

Read more

Summary

Introduction

The Indo-Pak subcontinent is one of the most significant markets for all types of products. South Asia or the Indian subcontinent is one of the largest markets for all such types of organizations. In this highly populated area and homeland of approximately 1.95 billion people 14 August 2021), companies are attracted to selling their products and understanding people’s experiences, feelings, and emotions about their businesses. It is the best strategy to express personal experiences and feelings on the Internet by using the local language. This subcontinent is very rich in languages, and more than 451 languages are spoken in the Indi-Pak subcontinent. Out of the languages taught in this subcontinent, Hindi and

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call