A Comparative Study of Feature Selection and Machine Learning Methods for Sentiment Classification on Movie Data Set

C Selvi,E Sivasankar,Chakshu Ahuja

doi:10.1007/978-81-322-2268-2_39

Abstract

Sentiment analysis has become a leading research domain with the advent of Web 2.0 where Web users express their opinions in user forums, blogs, discussion boards, and review sites. The online information is considered to be a valuable source for decision making, improving the quality of service, and helping the service providers to enhance their competitiveness. Since the processing of high-dimensional text data is not scalable, different feature selection mechanisms are being used to confine the study to only most informative features. These features are then used to train the classifier to improve the accuracy of sentiment-based classification. This paper explores six feature selection mechanisms (IG, GR, CHI, OneR, Relief-F, and SAE) with five different machine learning classifiers (SVM, NB, DT, K-NN, and ME) thereby providing Accuracy, on the movie review data set for each. Comparative results show that Naive Bayes (NB) outperforms other classifiers and works better for Gain Ratio (GR) and Significance Attribute Evaluation (SAE) feature selection method.

Full Text