Abstract
Feature extraction is one of the key steps for text sentiment analysis (SA), and the corresponding algorithms have important effect on the results. In the paper, a novel methodology is proposed to extract the feature for SA of product reviews. First, based on the diversified expression forms of product reviews, the generalized TF–IDF feature vectors are obtained by introducing the semantic similarity of synonyms. Then, in view of the different lengths of product reviews, the local patterns of the feature vectors are identified with OPSM biclustering algorithm. Finally, we improve PrefixSpan algorithm to detect the frequent and pseudo-consecutive phrases with high discriminative ability (namely FPCD phrases), which contain word-order information. Furthermore, some important factors, such as the separation and discriminative ability of words, are also employed to improve the discriminative ability of sentiment polarity. Based on the previous steps, the text feature vectors are extracted. A series of the experiment and comparison results indicate that the performance for SA on product review is greatly improved.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.