Abstract

Online reviews regarding different products or services have become the main source to determine public opinions. Consequently, manufacturers and sellers are extremely concerned with customer reviews as these have a direct impact on their businesses. Unfortunately, to gain profits or fame, spam reviews are written to promote or demote targeted products or services. This practice is known as review spamming. In recent years, the spam review detection problem has gained much attention from communities and researchers, but still there is a need to perform experiments on real-world large-scale review datasets. This can help to analyze the impact of widespread opinion spam in online reviews. In this work, two different spam review detection methods have been proposed: (1) Spam Review Detection using Behavioral Method (SRD-BM) utilizes thirteen different spammer's behavioral features to calculate the review spam score which is then used to identify spammers and spam reviews, and (2) Spam Review Detection using Linguistic Method (SRD-LM) works on the content of the reviews and utilizes transformation, feature selection and classification to identify the spam reviews. Experimental evaluations are conducted on a real-world Amazon review dataset which analyze 26.7 million reviews and 15.4 million reviewers. The evaluations show that both proposed models have significantly improved the detection process of spam reviews. Specifically, SRD-BM achieved 93.1% accuracy whereas SRD-LM achieved 88.5% accuracy in spam review detection. Comparatively, SRD-BM achieved better accuracy because it works on utilizing rich set of spammers behavioral features of review dataset which provides in-depth analysis of spammer behaviour. Moreover, both proposed models outperformed existing approaches when compared in terms of accurate identification of spam reviews. To the best of our knowledge, this is the first study of its kind which uses large-scale review dataset to analyze different spammers' behavioral features and linguistic method utilizing different available classifiers.

Highlights

  • Nowadays, the World Wide Web (WWW) has become the main source for individuals to express themselves

  • Table shows that Spam Review Detection using Behavioral Method (SRD-BM) outperforms the existing approaches by achieving an accuracy of 92% on Yelp dataset

  • 1) EVALUATION OF CLASSIFICATION ALGORITHMS USING Spam Review Detection using Linguistic Method (SRD-LM) four classification algorithms are evaluated using Spam Review Detection (SRD)-LM with different N-gram combinations coupled with various Information Gain (IG) variations

Read more

Summary

Introduction

The World Wide Web (WWW) has become the main source for individuals to express themselves. People can share their views about any product or service by using e-commerce sites, forums and blogs. Everybody on the web is acknowledging the importance of these online reviews for both customers and vendors. Most people read reviews about products and services before buying them. Vendors can design their future production or marketing strategies based on these reviews [1]. If various customers buying a specific model of a laptop, post reviews about issues

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call