Abstract
Spam reviews are increasingly appearing on the Internet to promote sales or defame competitors by misleading consumers with deceptive opinions. This paper proposes a co-training approach called CoSpa (Co-training for Spam review identification) to identify spam reviews by two views: one is the lexical terms derived from the textual content of the reviews and the other is the PCFG (Probabilistic Context-Free Grammars) rules derived from a deep syntax analysis of the reviews. Using SVM (Support Vector Machine) as the base classifier, we develop two strategies, CoSpa-C and CoSpa-U, embedded within the CoSpa approach. The CoSpa-C strategy selects unlabeled reviews classified with the largest confidence to augment the training dataset to retrain the classifier. The CoSpa-U strategy randomly selects unlabeled reviews with a uniform distribution of confidence. Experiments on the spam dataset and the deception dataset demonstrate that both the proposed CoSpa algorithms outperform the traditional SVM with lexical terms and PCFG rules in spam review identification. Moreover, the CoSpa-U strategy outperforms the CoSpa-C strategy when we use the absolute value of decision function of SVM as the confidence.
Highlights
With the spread of the Internet and Web 2.0, it has been accepted that user-generated content (UGC) contains valuable clues that can be exploited for various applications such as e-commerce [1], tourism [2], software development [3], etc
Using the spam dataset provided by Ott et al [5], we find there is a difference in the expressions and PCFG rules used by deceptive and truthful reviews
We see that using the feature set word by terms with PCFG
Summary
With the spread of the Internet and Web 2.0, it has been accepted that user-generated content (UGC) contains valuable clues that can be exploited for various applications such as e-commerce [1], tourism [2], software development [3], etc. Online reviews, which refer to UGCs of opinions of products that users have purchased via e-commerce, are very popular nowadays. These reviews are widely used by potential customers to seek opinions of existing users before making a purchase decision. If one has to choose between two products of the same type, he or she is very likely to buy the product with more positive reviews from existing users. Positive reviews can boost the reputation of a product and create a competitive advantage over other products. Negative reviews can damage the reputation of a product and results in a competitive loss
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.