Abstract

The assessment of document sentiment orientation using term specificity information is advocated in this study. An interpretation of the mathematical meaning of term specificity information is given based on Shannon’s entropy. A general form of a specificity measure is introduced in terms of the interpretation. Sentiment classification using the specificity measures is proposed within a Bayesian learning framework, and some potential problems are clarified and solutions are suggested when the specificity measures are applied to estimation of posterior probabilities for the NB classifier. A novel method is proposed which allows each document to have multiple representations, each of which corresponds to a sentiment class. Our experimental results show, while both the proposed method and IR techniques can produce high performance for sentiment classification, that our method outperforms the IR techniques.

Highlights

  • The proliferation of web-centred social interaction has led to increasing quantities of opinion-dense text

  • We propose a general method to represent the statistical importance of terms pertaining to individual documents with estimation of posterior probabilities using term weights obtained from TERM SPECIFICITY INFORMATION (TSI) for the Naive Bayes (NB) classifier

  • We clarify some potential problems inherent in applying the specificity measures in a Bayesian learning framework and, suggest solutions that are easy to apply in practice

Read more

Summary

INTRODUCTION

The proliferation of web-centred social interaction has led to increasing quantities of opinion-dense text. This study focuses on the second issue: design method to represent documents using Term Specificity Information (TSI) for accurate and reliable SC. There has been no systematic discussion on how to use TSI to represent documents for SC and there exist some potential problems in applying specificity measures to the NB classifier for SC. It is worth mentioning, rather than considering all terms in documents, that [18] attempts to determine the specificity of nouns. We propose a general method to represent the statistical importance of terms pertaining to individual documents with estimation of posterior probabilities using term weights obtained from TSI for the NB classifier.

A General Form of a TSI Measure
Example TSI Measures
The NB Classifier
Estimation of Posterior Probabilities
Problems
PROBLEMS APPLYING TSI FOR SC
Solutions
EXPERIMENTS
Findings
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.