Abstract

Sentiment Analysis (SA) aims to extract useful information from online Unstructured User-Generated Contents (UUGC) and classify them into positive and negative classes. State-of-the-art techniques for SA suffer a high dimensional feature space because of noisy and irrelevant features from the UUGC. Researchers have also proposed feature extraction and selection techniques to reduce high dimensional feature space, but they fall short in extracting and selecting the most effective sentiment features for sentiment model learning. Effective feature extraction and selection are significant for the SA because they can boost the learning algorithm’s predictive performance while reducing the high-dimensional feature space. To address these concerns, we propose an Intelligent Hybrid Feature Selection for Sentiment Analysis (IHFSSA) based on ensemble learning methods. IHFSSA first identifies sentiment features in the review text utilizing Penn Treebank part-of-speech tagset and integrated Wide Coverage Sentiment Lexicons (WCSL). The sentiment features subset is then selected employing a fast and simple rank-based ensemble of multiple filters feature selection method. The selected sentiment features are further refined by applying a wrapper-based backward feature selection method. Finally, for textual sentiment classification, the well-known classification algorithms Support Vector Machine (SVM), Naive Bayes (NB), Generalized Linear Model (GLM) are trained in the ensemble model on the refined sentiment feature set. The in-depth evaluation using heterogeneous domain benchmark datasets demonstrates that IHFSSA outperforms existing SA techniques.

Highlights

  • B LOGS, discussion forums, shared knowledgeseeking networks, social network platforms, and product and movie review portals [1]–[5] are only a handful of social media platforms that have come up with Web 2.0 [6], [7]

  • We focus on document-level Sentiment Analysis (SA), processing each sentence in the document and convert them to words employing sentence parser and tokenizer, respectively

  • We propose an intelligent model for textual SA based on a hybrid feature selection with ensemble learning methods

Read more

Summary

INTRODUCTION

B LOGS, discussion forums, shared knowledgeseeking networks, social network platforms, and product and movie review portals [1]–[5] are only a handful of social media platforms that have come up with Web 2.0 [6], [7]. Jing et al [15] proposed two feature selection methods called modified categorical proportional difference (MCPD) and balance category feature (BCF) that selects attributes from text reviews Their experimental results showed that the combination of BCF and MCPD methods can reduce feature space and improve the sentiment classification performance. Kalaivani et al [49] proposed machine learning-based feature selection method utilizing IG and Genetic Algorithm They applied NB, logistic regression, SVM, and ensemble techniques on multi-domain datasets and movie review datasets for evaluation. According to the literature review, different feature extraction or/and selection strategies, as well as ensemble learning methods for sentiment classification, have been introduced by researchers The technical details of the proposed methodology are elaborated in the following sub-sections

FEATURE REPRESENTATION
INTEGRATED WIDE COVERAGE SENTIMENT
12 Goto Step two
SENTIMENT FEATURES EXTRACTION
FEATURES SELECTION
WRAPPER-BASED BACKWARD FEATURE SELECTION
A4 A5 A6
CLASSIFICATION ALGORITHMS AND ENSEMBLE LEARNING METHOD
EVALUATION MEASURES
EXPERIMENTAL SETTING
PERFORMANCE ANALYSIS OF HYBRID FEATURE SELECTION APPROACH
Method
80 SVM NB GLM CE
RESULTS SUMMARY
Methods
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.