Abstract

The exponential increase in the explosion of Web-based user generated reviews has resulted in the emergence of Opinion Mining (OM) applications for analyzing the users’ opinions toward products, services, and policies. The polarity lexicons often play a pivotal role in the OM, indicating the positivity and negativity of a term along with the numeric score. However, the commonly available domain independent lexicons are not an optimal choice for all of the domains within the OM applications. The aforementioned is due to the fact that the polarity of a term changes from one domain to other and such lexicons do not contain the correct polarity of a term for every domain. In this work, we focus on the problem of adapting a domain dependent polarity lexicon from set of labeled user reviews and domain independent lexicon to propose a unified learning framework based on the information theory concepts that can assign the terms with correct polarity (+ive, -ive) scores. The benchmarking on three datasets (car, hotel, and drug reviews) shows that our approach improves the performance of the polarity classification by achieving higher accuracy. Moreover, using the derived domain dependent lexicon changed the polarity of terms, and the experimental results show that our approach is more effective than the base line methods.

Highlights

  • The continuous increase in the content of social media forums and online review sites has propelled the emergence of Opinion Mining (OM) applications

  • There is a chance of omitting important words that could be included by other methods

  • Our experiments show that the resulting lexicon is comparable to the existing lexicons in terms of accuracies obtained with the sentence level polarity classification

Read more

Summary

Introduction

The continuous increase in the content of social media forums and online review sites has propelled the emergence of Opinion Mining (OM) applications. The manual strategy is based on selecting and annotating the words manually by a group of experts. Such a strategy is costly in terms of time and the effort required for manual work. The corpus-based approach can give sufficient coverage of such specialized content by learning the domain specific lexicon over a training corpus of labeled reviews in a specific domain. The polarity of the word “heartbeat” is neutral (0.75) in the SWN. Such a measure is inappropriate in the drug domain, e.g., in the sentence “This drug is good enough as it normalizes my heartbeat.” should have a positive polarity score. One possible solution for such problems is to modify the polarity of the words by using the corpus-based approach [8]

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call