Abstract

Sentiment Analysis (SA) is an active research area. SA aims to classify the online unstructured user-generated contents (UUGC) into positive and negative classes. A reliable training data is vital to learn a sentiment classifier for textual sentiment classification, but due to domain heterogeneity, manually construction of reliable labeled sentiment corpora is a laborious and time-consuming task. In the absence of enough labeled data, the alternative usage of sentiment lexicons and semi-supervised learning approaches for sentiment classification have substantially attracted the attention of the research community. However, state-of-the-art techniques for semi-supervised sentiment classification present research challenges expressed in questions like the following. How to effectively utilize the concealed significant information in the unstructured data? How to learn the model while considering the most effective sentiment features? How to remove the noise and redundant features? How to refine the initial training data for initial model learning as the random selection may lead to performance degradation? Besides, mainly existing lexicons have trouble with word coverage, which may ignore key domain-specific sentiment words. Further research is required to improve the sentiment lexicons for textual sentiment classification. In order to address such research issues, in this paper, we propose a novel unified sentiment analysis framework for textual sentiment classification called LeSSA. Our main contributions are threefold. (a) lexicon construction, generating quality and wide coverage sentiment lexicon. (b) training classification models based on a high-quality training dataset generated by using k-mean clustering, active learning, self-learning, and co-training algorithms. (c) classification fusion, whereby the predictions from numerous learners are confluences to determine final sentiment polarity based on majority voting, and (d) practicality, that is, we validate our claim while applying our model on benchmark datasets. The empirical evaluation of multiple domain benchmark datasets demonstrates that the proposed framework outperforms existing semi-supervised learning techniques in terms of classification accuracy.

Highlights

  • In the last two decades, the web became a primary source, where people look for information and share experiences and perceptions in the form of comments or opinions

  • In order to address the above issues in state-of-the-art work, in this paper, we propose a novel unified Sentiment Analysis (SA) framework based on high-quality wide coverage sentiment lexicons and semi-supervised learning techniques in conjunction with classification fusion for textual sentiment classification

  • The main difficulty in SML approach is the absence of enough an influential research area, which aims to classify online user’s reviews into positive and negative classes

Read more

Summary

Introduction

In the last two decades, the web became a primary source, where people look for information and share experiences and perceptions in the form of comments or opinions. The number of internet users is increasing quickly, and the size of the generated data on social network sites (SNSs) is very large. According to a statistical report issued in January 2017 by Hootsuite, 3.8 billion people Another report issued in April 2016 specified the strength of most popular SNSs such as Instagram (400 million users), Appl.

Objectives
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.