Analyzing product online reviews has drawn much interest in the academic field. In this research, a new probabilistic topic model, called tag sentiment aspect models (TSA), is proposed on the basis of Latent Dirichlet allocation (LDA), which aims to reveal latent aspects and corresponding sentiment in a review simultaneously. Unlike other topic models which consider words in online reviews only, syntax tags are taken as visual information and, in this research, as a kind of widely used syntax information, part-of-speech (POS) tags are first reckoned. Specifically, POS tags are integrated into three versions of implementation in consideration of the fact that words with different POS tags might be utilized to express consumers' opinions. Also, the proposed TSA is one unsupervised approach and only a small number of positive and negative words are required to confine different priors for training. Finally, two big datasets regarding digital SLR and laptop are utilized to evaluate the performance of the proposed model in terms of sentiment classification and aspect extraction. Comparative experiments show that the new model can not only achieve promising results on sentiment classification but also leverage the performance on aspect extraction.
Read full abstract