Abstract

Sentiment classification or sentiment analysis has been acknowledged as an open research domain. In recent years, an enormous research work is being performed in these fields by applying numerous methodologies. Feature generation and selection are consequent for text mining as the high dimensional feature set can affect the performance of sentiment analysis. This paper investigates the inability of the widely used feature selection method (IG, Chi Square, Gini Index) individually as well as their combined approach on four machine learning classification algorithm. The proposed methods are evaluated on three standard datasets viz. IMDb movie review, electronics and kitchen product review dataset. Initially, select the feature subsets from three different feature selection methods. Thereafter, statistical method UNION, INTERSECTION and revised UNION method are applied to merge these different feature subsets to obtain all top ranked including common selected features. Finally, train the classifier SMO, MNB, RF, and LR (logistic regression) with this feature vector for classification of the review data set. The performance of the algorithm is measured by evaluation methods such as precision, recall, F-measure and ROC curve. Experimental results show that the combined method achieved best accuracy of 92.31 with classifier SMO, which is encouraging and comparable to the related research.

Highlights

  • An opinion is a viewpoint or judgment about a specific thing that acts as a key influence on an individual process of decision making

  • The main contribution of the paper can be stated in particular as: 1. We provide a novelty sentiment classification method based on feature selection and machine learning (ML) technique and the proposed method evaluate on three standard benchmark datasets such as: movie reviews of Internet Movie Database (IMDb), Electronics and kitchen review datasets

  • An in-depth investigation was carried out to measure the effectiveness of the proposed approach i.e., to compare the performance of four supervised classifiers Sequential Minimal Optimization (SMO), multinomial naïve bayes (MNB), Random forest classifier (RF) and logistic regression based on the combination of the different feature selection method

Read more

Summary

Introduction

An opinion is a viewpoint or judgment about a specific thing that acts as a key influence on an individual process of decision making. People’s belief and the choices they make are always dependent on how others see and evaluate the world. Opinion holds high values in many aspect of life. Sentiment analysis is the process of determining opinions or sentiments in textual documents as positive, or negative. In recent years, this field is widely appreciated by researchers due to its dynamic range of application in various numbers of fields. Which are benefited from the result of sentiment analysis. Due to the vast range of movies these days, it has become difficult for the audience to select their preferred genre of movie.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call