Abstract

Along with online social media’s prosperity, the amount of user-generated reviews dramatically increases. The kinds of text-based user-generated content are conducive to estimating public sentiments. Many sentiment analysis works are based on the assumption that the sentiment expressed in online reviews can be retrieved from general text features. However, text redundancy and quantity can potentially impact the analysis performance, especially when strict corpus size constraints are applied. This paper proposes a sentiment subset selection framework to construct a small set of documents from the original corpus to convey a subjective representation. The framework can filter irrelevant sentiment information based on topic modeling and select subsets by submodular maximization with respect to a cardinality constraint. Our proposed score function can facilitate the framework to capture fine-grained sentiment features expressed in reviews compared with the conventional submodular-based one. An empirical analysis for the efficacy of the proposed sentiment subset selection framework (SentiSS) on different context domains is conducted. The comparative study of the subset’s metric impact on different sentiment levels, namely positive, neural, and negative, is also performed. Experimental results show that the SentiSS framework can compress the sentiment corpus and maintain the classifier’s performance on the metrics at the same time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call