Abstract
Sentiment analysis is getting increasingly popular as it facilitates gaining an indication of the wider public opinions or attitudes towards certain products, services, articles, etc. Many researchers have shown considerable interest in this field. Most of these studies have focused on English and other Indo-European languages. Very few studies have addressed the problem for the Arabic language. This is, mostly, due to the rare or nonexistent huge and free Arabic datasets that contains both Modern Standard Arabic (MSA) as well as Dialectal Arabic (DA). Generally, one of the main challenges for developing robust sentiment analysis systems is the availability of such large-scale datasets. Such datasets exist in abundance for English language, while it is not the case for a low-resource language such as the Arabic language. Recently, there have been some efforts for providing relatively large-scale Arabic datasets dedicated for sentiment analysis such as LABR and most recently BRAD 1.0, which is considered as the largest Arabic Book Reviews dataset for sentiment analysis and machine learning applications. In this work, we present BRAD 2.0, an extension to BRAD 1.0 with more than 200K extra records to account for several Arabic dialects. BRAD 2.0 has a total number of 692586 annotated reviews; each represents a single review along with the reviewer’s rating ranging from 1 to 5 of a certain book. The most interesting property of BRAD 2.0 is that it combines both MSA and DA. To verify and validate the proposed dataset, we implement several state-of-the-art supervised and unsupervised classifiers to categorize book reviews. For the unsupervised classifiers, we implemented several models of CNN and RNN classifiers utilizing GloVe-based word embeddings. Although all classifiers performed well, the highest accuracies attained are between 90% and 91%. Experimental results show that BRAD 2.0 is rich and robust. Our key contribution is to make this benchmark-dataset available and accessible to promote further research in the field of Arabic computational linguistic.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.