Abstract

Lexicon based methods use Sentiment Orientation (SO) scores of words contained in the text for polarity determination of documents. These SO scores are obtained from sentiment lexicons like MPQA and SentiWordNet which are built using methods such as Pointwise Mutual Information (PMI) to calculate the SO scores of the words. . The more popular PMI based methods for creating lexicons do not make use of the rich information that can be obtained from star ratings available for text documents such as reviews available for various product categories on online platforms like amazon, yelp.com or IMDB. Star ratings have only recently been used in some studies to calculate SO value of words in reviews for developing domain specific sentiment lexicons. This paper also makes use of star ratings but proposes a novel approach ‘SentiDraw’ where the probability distribution of words across reviews with different star ratings is used to calculate their SO scores. A comprehensive assessment of SentiDraw performance across multiple domains and datasets is also presented by comparing it with other methods that make use of star ratings of reviews for building lexicons. The results show that lexicons built with SentiDraw method delivers superior performance versus other lexicons in six out of nine cases. The accuracy score of sentiment classification using SentiDraw method ranges from 78.0% to 81.6% across domains and SentiDraw lexicon built using Hollywood dataset outperforms any other purely lexicon based method known to authors on the most experimented datasets like Cornell Movie Reviews Dataset (CMRD) and Large Movie Review Dataset (LMRD). Finally, a hybrid approach is also proposed that uses SentiDraw along with supervised methods to deliver state-of-art performance for polarity determination of reviews.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call