Abstract
PurposeThis paper aims to propose a statistical and context-aware feature reduction algorithm that improves sentiment classification accuracy. Classification of reviews with different granularities in two classes of reviews with negative and positive polarities is among the objectives of sentiment analysis. One of the major issues in sentiment analysis is feature engineering while it severely affects time complexity and accuracy of sentiment classification.Design/methodology/approachIn this paper, a feature reduction method is proposed that uses context-based knowledge as well as synset statistical knowledge. To do so, one-dimensional presentation proposed for SentiWordNet calculates statistical knowledge that involves polarity concentration and variation tendency for each synset. Feature reduction involves two phases. In the first phase, features that combine semantic and statistical similarity conditions are put in the same cluster. In the second phase, features are ranked and then the features which are given lower ranks are eliminated. The experiments are conducted by support vector machine (SVM), naive Bayes (NB), decision tree (DT) and k-nearest neighbors (KNN) algorithms to classify the vectors of the unigram and bigram features in two classes of positive or negative sentiments.FindingsThe results showed that the applied clustering algorithm reduces SentiWordNet synset to less than half which reduced the size of the feature vector by less than half. In addition, the accuracy of sentiment classification is improved by at least 1.5 per cent.Originality/valueThe presented feature reduction method is the first use of the synset clustering for feature reduction. In this paper features reduction algorithm, first aggregates the similar features into clusters then eliminates unsatisfactory cluster.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.