Effective Opinion Spam Detection: A Study on Review Metadata Versus Content

Ajay Rastogi,Monica Mehrotra,Syed Shafat Ali

doi:10.2478/jdis-2020-0013

Ajay Rastogi, Monica Mehrotra + Show 1 more

Open Access

https://doi.org/10.2478/jdis-2020-0013

Copy DOI

Journal: Journal of Data and Information Science	Publication Date: Apr 1, 2020
Citations: 15	License type: CC BY-NC-ND 4.0

Affiliation: Jamia Millia Islamia

Abstract

Abstract Purpose This paper aims to analyze the effectiveness of two major types of features—metadata-based (behavioral) and content-based (textual)—in opinion spam detection. Design/methodology/approach Based on spam-detection perspectives, our approach works in three settings: review-centric (spam detection), reviewer-centric (spammer detection) and product-centric (spam-targeted product detection). Besides this, to negate any kind of classifier-bias, we employ four classifiers to get a better and unbiased reflection of the obtained results. In addition, we have proposed a new set of features which are compared against some well-known related works. The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection. Findings Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings. In addition, models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual, further establishing the superiority of behavioral features as dominating indicators of opinion spam. The features used in this work provide improvement over existing features utilized in other related works. Furthermore, the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual. Research limitations The analyses conducted in this paper are solely limited to two well-known datasets, viz., YelpZip and YelpNYC of Yelp.com. Practical implications The results obtained in this paper can be used to improve the detection of opinion spam, wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information. Originality/value To the best of our knowledge, this study is the first of its kind which considers three perspectives (review, reviewer and product-centric) and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features. This study also introduces some novel features, which help to improve the performance of opinion spam detection methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Effective Opinion Spam Detection: A Study on Review Metadata Versus Content

Abstract

Talk to us

Similar Papers

More From: Journal of Data and Information Science

Lead the way for us

Similar Papers

Impact of Behavioral and Textual Features on Opinion Spam Detection
Ajay Rastogi ... Monica Mehrotra
-
Ajay Rastogi, et. al.Ajay Rastogi ... Monica Mehrotra
01 Jun 2018
01 Jun 2018

Opinion spam detection framework using hybrid classification scheme
Muhammad Zubair Asghar ... Aurangzeb Khan
Soft Computing | VOL. 24
Muhammad Zubair Asghar, et. al.Muhammad Zubair Asghar ... Aurangzeb Khan
11 Jun 2019
Soft Computing | VOL. 24

Opinion Spam Detection in Online Reviews Using Neural Networks
K Archchitha ... E.Y.A Charles
-
K Archchitha, et. al.K Archchitha ... E.Y.A Charles
01 Sep 2019
01 Sep 2019

Opinion spam detection: Using multi-iterative graph-based model
Shirin Noekhah ... Nor Hawaniah Zakaria
Information Processing & Management | VOL. 57
Shirin Noekhah, et. al.Shirin Noekhah ... Nor Hawaniah Zakaria
18 Oct 2019
Information Processing & Management | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effective Opinion Spam Detection: A Study on Review Metadata Versus Content

Abstract

Talk to us

Similar Papers

More From: Journal of Data and Information Science