Comparing Sentiment Analysis and Document Representation Methods of Amazon Reviews

Katic Tamara,Nemanja Milicevic

doi:10.1109/sisy.2018.8524814

Abstract

In the last few years sentiment analysis has made much progress. Sentiment analysis has been used in several applications to identify the opinions of people, products, brands, services, etc., which can, for example, improve a company's business. Some of these applications claim to have more effective document representation models than merely Information Retrieval approaches like the bag-of-words representation. Document representation models have increased interest to solve some of the limitations that bag-of-words representation has. In this paper, the several sentiment analysis and document representation methods of Amazon reviews are compared. In this paper, traditional models such as a bag-of-words, bag-of-ngrams and their TF-IDF variants combined with linear classifiers such as Logistic Regression and SVM, and deep learning models such as word-based convolutional neural networks (ConvNets) and the simple long short-term memory (LSTM) recurrent neural network were used. Various document representation techniques such as Paragraph Vector or using pre-trained Word2Vec and Glove word embeddings to compute the vector for each word in the document were tested, and word vectors are aggregated using the element-wise mean. It is shown that deep learning models perform better on our large dataset than traditional models. LSTM resulted with the best accuracy of 95.55%. Deep learning models generally work better than traditional models as training set size increases. Our best performing model can be used for automatic sentiment classification for future product reviews in retail stores.

Full Text