Abstract

We propose a novel method for counting sentiment orientation that outperforms supervised learning approaches in time and memory complexity and is not statistically significantly different from them in accuracy. Our method consists of a novel approach to generating unigram, bigram and trigram lexicons. The proposed method, called frequentiment, is based on calculating the frequency of features (words) in the document and averaging their impact on the sentiment score as opposed to documents that do not contain these features. Afterwards, we use ensemble classification to improve the overall accuracy of the method. What is important is that the frequentiment-based lexicons with sentiment threshold selection outperform other popular lexicons and some supervised learners, while being 3–5 times faster than the supervised approach. We compare 37 methods (lexicons, ensembles with lexicon’s predictions as input and supervised learners) applied to 10 Amazon review data sets and provide the first statistical comparison of the sentiment annotation methods that include ensemble approaches. It is one of the most comprehensive comparisons of domain sentiment analysis in the literature.

Highlights

  • Sentiment analysis of texts means assigning a measure on how positive, neutral or negative the text is

  • In this paper we would like to present the continuation of our work presented in [27], where we have used sentiment lexicons as first stage classifiers and employed a decision tree as a fusion classifier, which learned based on the output of the lexicons

  • We propose a new method for lexicon generation—frequentiment—based on likelihood increased, when the document contains a given feature averaged by score per feature

Read more

Summary

Introduction

Sentiment analysis of texts means assigning a measure on how positive, neutral or negative the text is. It can be performed by experts, automatically or both, as different sentiment classifications can be treated as input to improve accuracy. To increase the accuracy of these results, different annotators would annotate a given text and check how many annotations gave the same result. What lies behind such an approach is the intuition that if more people give the same response to the same text, the probability that the response is correct rises. On the other hand this approach is expensive, time consuming and may require sophisticated methods of selecting annotators to attain a real rise in accuracy

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.