Abstract
With the recent advances in Natural Language Processing (NLP) technologies, the ability to process, analyze, and understand sentiments expressed in user-generated reviews regarding the products and services they use is becoming more achievable. Despite the latest improvements in this field, little attention has been given to multilingual sentiment analysis. In this article, a framework is presented for sentiment analysis in Arabic and English using two datasets (ASTD, AJGT) along with their translations. Preprocessing techniques, including n-gram tokenization, Arabic-specific stop words removal, punctuation removal, removing repeating characters, parts of speech tagging, stemming, and lemmatization, are applied. Four machine learning classifiers, namely Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), and Support Vector Machine (SVM), are employed. We highlight existing specialized research in sentiment analysis for Arabic and English, as well as the employed techniques in each. Furthermore, the impact of preprocessing on accuracy results for both Arabic and English languages is investigated through separate experiments for each step. Experimental results on the ASTD dataset demonstrate close performance across classifiers, with the SVM classifier achieving the highest accuracy of 70%. However, the accuracy varied when using the AJGT dataset, with the NB classifier yielding the best accuracy at approximately 87%. The experiments on the translated datasets from Arabic to English did not exhibit significant differences, although some features performed slightly better using the Arabic datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: The International Arab Journal of Information Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.