Abstract

Opinion spam detection is concerned with identifying fake reviews that are deliberately placed to either promote or discredit a product. Opinionated social media like product reviews are increasingly important resources for people as well as businesses in the decision-making process and can be easily manipulated by opportunistic individuals. To reduce this increasing impact of opinion spams, opinion spam detection approaches have been proposed, which adopt mostly supervised classification methods. However, in practice, the provided data is largely not labeled and therefore semi-supervised learning approaches are required instead. To this end, this study aims to analyze the effectiveness of several semi-supervised learning approaches for opinion spam classification. Four different semi-supervised methods are evaluated on a dataset of both genuine and deceptive hotel reviews. The results are compared with several traditional classification methods using the same amount of labeled data. According to this study, the self-training algorithm with Naive Bayes as the base classifier yields 93% accuracy. Results show that self-training is the only approach, out of the four tested semi-supervised models, that outperforms traditional supervised classification models when limited data is available. This study further shows that self-training can mitigate labeling efforts while retaining high model performance, which is useful for scenarios where limited data is available or retrieving labeled data is more costly.

Highlights

  • Opinionated social media such as product reviews have become an important resource for individuals and organizations in the decision-making process

  • Additional experiments: The supervised learning experiments are tested on two additional datasets

  • The effectiveness of semi-supervised learning methods for opinion spam classification is explored with the help of the gold-standard dataset of hotel reviews developed by Ott et al (2011) and two additional Yelp review datasets

Read more

Summary

Introduction

Opinionated social media such as product reviews have become an important resource for individuals and organizations in the decision-making process. The rise of e-commerce platforms caused an enormous growth in the number of opinions spread online. Due to this trend, opinion spam detection has become a prominent issue. Clues to identifying spammers are usually hidden in multiple aspects such as content, behavior, relationships, and interaction with the review [5]. Opinion spam detection aims to identify multiple features that relate to a fake review. The most widely available feature is the review content, which refers to the actual textual information in the review. The meta-data of the review can reveal valuable information. Real-life knowledge about the product could reveal spammer clues

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.