Performance Evaluation of Sentiment Analysis on Balanced and Imbalanced Dataset Using Ensemble Approach

Shini George,V Srividhya

doi:10.17485/ijst/v15i17.2339

Abstract

Background: Class imbalance is often discussed as a strenuous task in the realm of sentiment analysis. In an imbalanced classification, few minority class instances are unable to provide sufficient information, therefore direct learning from an unbalanced dataset can produce unsatisfactory results. This work aims to address the problem of class imbalance. Methods: At primary level this study uses a novel Synthetic Minority Oversampling Technique (SMOTE) for balancing the dataset and then proposes an ensemble model, named Ensemble Bagging Support Vector Machine (EBSVM) for opinion mining. To measure the performance of the particular approach, numerous analyses are conducted on both imbalanced and balanced datasets. Then the work compares the effectiveness of the suggested model with three base classifiers (Nave Bayes (NB), Decision Tree (DT), and Support Vector Machine (SVM)). The customer reviews for restaurants is chose as the dataset for this work. Accuracy, precision, recall and F-measure are used as metrics for evaluation. Findings: According to the results, the suggested EBSVM model excels all other individual classifiers with the imbalanced and SMOTE balanced dataset. The balanced EBSVM classifier improves the imbalanced EBSVM Classifier in terms of accuracy. Precision, recall and F-measure of the minority class in the imbalanced classifiers have been improved in balanced Classifiers. Novelty: The performance of opinion mining classifiers for imbalanced and balanced datasets is evaluated in this paper. The work examines not only general opinions, but also specific aspects such as food, service, ambiance, quality, and price. Comparing the suggested model with existing classification algorithms in the literature, it has found that it outperformed the other models. Keywords: Bagging; Accuracy; Ensemble; Precision; Recall; Fmeasure

Full Text