Developing Turkish sentiment analysis models using machine learning and e-commerce data

Murat Demircan,Adem Seller,Fatih Abut,Mehmet Fatih Akay

doi:10.1016/j.ijcce.2021.11.003

Abstract

With the increment of Internet usage, there has been a significant increase in the access and interaction of users in social media, blogs, forums, and criticism sites recently. With social media, access to a large amount of data on various products, services, social and political events is provided. Important feedback about products and services can be obtained as a result of analyzing such data. This study aims to determine the sentiments expressed via texts on social media using machine learning methods. As a result of initial research, it is determined that the best case in which texts and emotions match was the product reviews and ratings used on e-commerce websites. Reviews on different products along with review scores from an e-commerce website have been converted into a table to be used in the machine learning-based sentiment analysis models. Reviews have been classified into three groups as positive, negative, and neutral using the review scores. Considering this claim, Turkish sentiment analysis models were developed using support vector machine (SVM), random forest (RF), decision tree (DT), logistic regression (LR), and k-nearest neighbors (KNN). Cross-validation results on independent test data taken from the same e-commerce website show that the SVM-based and RF-based sentiment analysis models outperform the other models. In more detail, there is no strict order between SVM-based and RF-based prediction models, but the results of the SVM-based and RF-based models, in general, are the highest or, in the worst case, similar if we compare them with the scores obtained by using the DT-based, LR-based, and KNN-based models. It can be concluded that SVM and RF are viable methods that can be used to classify product reviews into three groups as positive, negative, and neutral within acceptable error rates.

Full Text