Abstract
The assessment of document sentiment orientation using term specificity information is advocated in this study. An interpretation of the mathematical meaning of term specificity information is given based on Shannon’s entropy. A general form of a specificity measure is introduced in terms of the interpretation. Sentiment classification using the specificity measures is proposed within a Bayesian learning framework, and some potential problems are clarified and solutions are suggested when the specificity measures are applied to estimation of posterior probabilities for the NB classifier. A novel method is proposed which allows each document to have multiple representations, each of which corresponds to a sentiment class. Our experimental results show, while both the proposed method and IR techniques can produce high performance for sentiment classification, that our method outperforms the IR techniques.
Highlights
The proliferation of web-centred social interaction has led to increasing quantities of opinion-dense text
We propose a general method to represent the statistical importance of terms pertaining to individual documents with estimation of posterior probabilities using term weights obtained from TERM SPECIFICITY INFORMATION (TSI) for the Naive Bayes (NB) classifier
We clarify some potential problems inherent in applying the specificity measures in a Bayesian learning framework and, suggest solutions that are easy to apply in practice
Summary
The proliferation of web-centred social interaction has led to increasing quantities of opinion-dense text. This study focuses on the second issue: design method to represent documents using Term Specificity Information (TSI) for accurate and reliable SC. There has been no systematic discussion on how to use TSI to represent documents for SC and there exist some potential problems in applying specificity measures to the NB classifier for SC. It is worth mentioning, rather than considering all terms in documents, that [18] attempts to determine the specificity of nouns. We propose a general method to represent the statistical importance of terms pertaining to individual documents with estimation of posterior probabilities using term weights obtained from TSI for the NB classifier.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Advanced Computer Science and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.