Abstract

Feature extraction is one of the key steps for text sentiment analysis (SA), and the corresponding algorithms have important effect on the results. In the paper, a novel methodology is proposed to extract the feature for SA of product reviews. First, based on the diversified expression forms of product reviews, the generalized TF–IDF feature vectors are obtained by introducing the semantic similarity of synonyms. Then, in view of the different lengths of product reviews, the local patterns of the feature vectors are identified with OPSM biclustering algorithm. Finally, we improve PrefixSpan algorithm to detect the frequent and pseudo-consecutive phrases with high discriminative ability (namely FPCD phrases), which contain word-order information. Furthermore, some important factors, such as the separation and discriminative ability of words, are also employed to improve the discriminative ability of sentiment polarity. Based on the previous steps, the text feature vectors are extracted. A series of the experiment and comparison results indicate that the performance for SA on product review is greatly improved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call