Abstract

Researches on sentiment analysis are growing to a great extent and attracting wide ranges of attention from academics and industries as well. Feature generation and selection are consequent for text mining as the high dimensional feature set can affect the performance of sentiment analysis. This paper exhibits the efficacy of the proposed combined feature selection technique on machine learning classification algorithms over their individual usefulness. Initially, we transform the review datasets into the feature vector of unigram features along with bi-tagged features based on POS pattern. Next, information gain (IG), Chi squared (χ2) and minimum redundancy maximum relevancy (mRMR) feature selection methods are applied to obtain an optimal feature subset for further functionality. These features are then given input to multiple machine learning classifiers, namely, support vector machine (SVM), multinomial Naïve Bayes (MNB), Bernoulli Naïve Bayes (BNB) and logistic regression (LR) on multi domain product review datasets. The performance of the algorithm is measured by evaluation methods such as precision, recall, and F-measure. Experimental results show that the feature selection method mRMR with SVM achieved a better accuracy of 91.39, which is encouraging and comparable to the related research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call