Abstract
Sentiment analysis is about the classification of sentiments expressed in review documents. In order to improve the classification accuracy, feature selection methods are often used to rank features so that non-informative and noisy features with low ranks can be removed. In this study, we propose a new feature selection method, called query expansion ranking, which is based on query expansion term weighting methods from the field of information retrieval. We compare our proposed method with other widely used feature selection methods, including Chi square, information gain, document frequency difference, and optimal orthogonal centroid, using four classifiers: naïve Bayes multinomial, support vector machines, maximum entropy modelling, and decision trees. We test them on movie and multiple kinds of product reviews for both Turkish and English languages so that we can show their performances for different domains, languages, and classifiers. We observe that our proposed method achieves consistently better performance than other feature selection methods, and query expansion ranking, Chi square, information gain, document frequency difference methods tend to produce better results for both the English and Turkish reviews when tested using naïve Bayes multinomial classifier.
Highlights
Sentiment analysis is about the classification of sentiments expressed in review documents
Among the feature selection methods we considered, we notice that information gain (IG) and OCFS are good at distinguishing multiple classes, while CHI2, document frequency difference (DFD), and query expansion ranking (QER) are restricted to two classes, all of them are suitable for sentiment analysis
Our results show that for all Turkish review datasets, the best results are all obtained with the naïve Bayes multinomial (NBM) classifier, and for some English review datasets, logistic regression (LR) and support vector machines (SVM) have the best performance
Summary
Sentiment analysis is about the classification of sentiments expressed in review documents. There are a number of studies about sentiment analysis that use different approaches for data preprocessing, feature selection, and sentiment classification [1, 3, 4, 6,7,8,9,10]. The statistical methods such as Chi square (CHI2) and information gain (IG) are used to eliminate unnecessary or irrelevant features so that the classification performance can be improved [11]. Sentiment-expressing words like “great” are not so frequent within a particular review, but can be more frequent across different reviews, and a good feature selection method for SA should take this observation into account
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.