Abstract

Sentiment analysis or opinion mining is the key to natural language processing for the extraction of useful information from the text documents of numerous sources. Several different techniques, i.e., simple rule-based to lexicon-based and more sophisticated machine learning algorithms, have been widely used with different classifiers to get the factual analysis of sentiment. However, lexicon-based sentiment classification is still suffering from low accuracies, mainly due to the deficiency of domain-oriented competitive dictionaries. Similarly, machine learning-based sentiment is also tackling the accuracy constraints because of feature ambiguity from social data. One of the best ways to deal with the accuracy issue is to select the best feature-set and reduce the volume of the feature. This paper proposes a method (namely, GAWA) for feature selection by utilizing the Wrapper Approaches (WA) to select the premier features and the Genetic Algorithm (GA) to reduce the size of the premier features. The novelty of this work is the modified fitness function of heuristic GA to compute the optimal features by reducing the redundancy for better accuracy. This work aims to present a comprehensive model of hybrid sentiment by using the proposed method, GAWA. It will be valued in developing a new approach for the selection of feature-set with a better accuracy level. The experiments revealed that these techniques could reduce the feature-set up-to 61.95% without negotiating the accuracy level. The new optimal feature sets enhanced the efficiency of the Naïve Bayes algorithm up to 92%. This work is compared with the conventional method of feature selection and concluded the 11% better accuracy than PCA and 8% better than PSO. Furthermore, the results are compared with the literature work and found that the proposed method outperformed the previous research.

Highlights

  • Online Twitter users share their emotions such as joy and sorrow about any product or activity

  • One of the biggest challenges of accuracy regarding the massive volume of features is tackled

  • A novel method is proposed for the optimal feature selection. It is based on two Wrapper approaches for premier feature selection, and the Genetic algorithm by its modified fitness function for feature reductions

Read more

Summary

Introduction

Online Twitter users share their emotions such as joy and sorrow about any product or activity. Other users can have better knowledge through the existed reviews by the users who have experience with specific items [1]. Forbes reported that 2.5 quintillion bytes of data are being generated every day [2]. In business analytics, this massive data is worthy, but it contains enormous slangs and redundancy [3]. This research is emphasizing the essential need for evaluation of user-generated data to address the issue of detecting, extract, and analyze the user opinions for the progression of organizations more efficiently and effectively [5]

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call