Abstract

Sentiment classification (SC) is an ongoing field of research, which involves computing opinions, sentiments, and the subjectivity of a text. It has recently been proven that imbalanced classification is challenging for the SC research community. Most existing studies assume that the balance between negative and positive samples may not be true in reality. This work describes a method to improve the problem of imbalanced sentiment classification using supervised term weighting schemes and shows how these weighting schemes can improve the performance of sentiment classification with imbalanced data, especially in the domain of multi-class classification. Nonetheless, to obtain the most appropriate term weighting schemes, five term weighting schemes are comparatively studied, namely tf-idf, tf-idf-icf, tf-rf, tf-igm, and sqrt_tf-igm. In addition to comparing several term weightings schemes, this work also compares four supervised machine learning algorithms to obtain an appropriate algorithm, including k-Nearest Neighbor (k-NN), Multinomial Naïve Bayes (MNB), Support Vector Machines (SVM) with linear, and SVM with RBF. After evaluating by F1, the performance of sqrt_tf-igm was superior to all other weighting schemes. Since the overall picture of sqrt_tf-igm returned better results than the tf-idf, tf-idf-icf, and tf-rf methods, with improved scores of F1 at 10.94%. Meanwhile, the result of sqrt_tf-igm was slightly better than tf-igm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call