Sentiment Analysis of DANA Application Reviews on Google Play Store Using Naïve Bayes Classifier Algorithm Based on Information Gain

Cindy Caterine Yolanda Cindy Caterine Yolanda,Dina Fitria Dina Fitria,Syafriandi Syafriandi Syafriandi Syafriandi,Yenni Kurniawati Yenni Kurniawati

doi:10.24036/ujsds/vol2-iss1/147

Cindy Caterine Yolanda Cindy Caterine Yolanda, Dina Fitria Dina Fitria + Show 2 more

Open Access

https://doi.org/10.24036/ujsds/vol2-iss1/147

Copy DOI

Journal: UNP Journal of Statistics and Data Science	Publication Date: Feb 25, 2024
License type: CC BY 4.0

Abstract

DANA is a digital payment platform that provides various features to make it easier for users to make payments, transfers, and balance replenishment online. DANA application users provide a variety of reviews that include both constructive and critical opinions, which can be valuable input for DANA application developers. The purpose of this research is to evaluate the results of sentiment classification of DANA application user reviews on the Google Play Store service using the Naïve Bayes Classifier method and Information Gain feature selection. In addition, this study aims to assess the effect of applying IG feature selection on the performance of the resulting model. In this study, reviews are divided into two categories, namely positive and negative based on lexicon-based labeling. Furthermore, data weighting, feature selection, and data division are carried out with a proportion of 80% train data and 20% test data before model building. There are two models, namely a model without feature selection (NBC model) and a model with feature selection (NBC-IG model). The evaluation results showed that the NBC model with 1106 features performed well, with 82.91% accuracy, 83.96% precision, and 90.23% recall. Meanwhile, the NBC-IG model with 536 features showed higher performance, with 85.09% accuracy, 85.79% precision, and 92.09% recall. The application of IG feature selection with the IG value limit parameter > 0.01 in the NBC model successfully reduced the number of features by 570, and improved model performance with an increase in accuracy by 2.18%, precision by 1.83%, and recall by 1.86%.

Full Text