Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques

Mohammed Maree,Enas Mesqali,Mujahed Eleyat

doi:10.34028/iajit/21/2/8

Abstract

With the recent advances in Natural Language Processing (NLP) technologies, the ability to process, analyze, and understand sentiments expressed in user-generated reviews regarding the products and services they use is becoming more achievable. Despite the latest improvements in this field, little attention has been given to multilingual sentiment analysis. In this article, a framework is presented for sentiment analysis in Arabic and English using two datasets (ASTD, AJGT) along with their translations. Preprocessing techniques, including n-gram tokenization, Arabic-specific stop words removal, punctuation removal, removing repeating characters, parts of speech tagging, stemming, and lemmatization, are applied. Four machine learning classifiers, namely Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), and Support Vector Machine (SVM), are employed. We highlight existing specialized research in sentiment analysis for Arabic and English, as well as the employed techniques in each. Furthermore, the impact of preprocessing on accuracy results for both Arabic and English languages is investigated through separate experiments for each step. Experimental results on the ASTD dataset demonstrate close performance across classifiers, with the SVM classifier achieving the highest accuracy of 70%. However, the accuracy varied when using the AJGT dataset, with the NB classifier yielding the best accuracy at approximately 87%. The experiments on the translated datasets from Arabic to English did not exhibit significant differences, although some features performed slightly better using the Arabic datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques

Abstract

Talk to us

Similar Papers

More From: The International Arab Journal of Information Technology

Lead the way for us

Similar Papers

Test
Mohammed Maree ... Enas Mesqali
The International Arab Journal of Information Technology | VOL. 21
Mohammed Maree, et. al.Mohammed Maree ... Enas Mesqali
01 Jan 2024
The International Arab Journal of Information Technology | VOL. 21

Sentiment Analysis Using Deep Learning Approaches on Multi-Domain Dataset in Telugu Language
Kannaiah Chattu ... D Sumathi
Journal of Information & Knowledge Management | VOL. -
Kannaiah Chattu, et. al.Kannaiah Chattu ... D Sumathi
10 Jan 2024
Journal of Information & Knowledge Management | VOL. -

Sentiment analysis on social media tweets using dimensionality reduction and natural language processing
Erick Odhiambo Omuya ... Michael Kimwele
Engineering Reports | VOL. 5
Erick Odhiambo Omuya, et. al.Erick Odhiambo Omuya ... Michael Kimwele
11 Oct 2022
Engineering Reports | VOL. 5

Sentiment Analysis for Informal Malay Text in Social Commerce
Adiba Nabiha ... Sofianita Mutalib
-
Adiba Nabiha, et. al.Adiba Nabiha ... Sofianita Mutalib
08 Sep 2021
08 Sep 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques

Abstract

Talk to us

Similar Papers

More From: The International Arab Journal of Information Technology