Fast Detection of Deceptive Reviews by Combining the Time Series and Machine Learning

Minjuan Zhong,Xilong Qu,Shengzong Liu,Zhenjin Li,Bo Yang,Rui Tan

doi:10.1155/2021/9923374

Minjuan Zhong, Xilong Qu + Show 4 more

Open Access

PDF Available

https://doi.org/10.1155/2021/9923374

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

With the rapid growth of online product reviews, many users refer to others’ opinions before deciding to purchase any product. However, unfortunately, this fact has promoted the constant use of fake reviews, resulting in many wrong purchase decisions. The effective identification of deceptive reviews becomes a crucial yet challenging task in this research field. The existing supervised learning methods require a large number of labeled examples of deceptive and truthful opinions by domain experts, while the available unsupervised learning methods are inefficient because they depend on the features of reviewers to detect each fake review. Therefore, by focusing on the detection efficiency problem and the limitation of large amount of labeled examples dependence, in this paper, we proposed an effective semisupervised learning approach for detecting spam reviews. Firstly, a time series model of all the reviews of a product is constructed, and then the suspected time intervals are captured based on the burst review increases in these intervals. Secondly, a co-training two-view semisupervised learning algorithm was performed in each captured interval, in which linguistic cues, metadata, and user purchase behaviors were synthetically employed to classify the reviews and check whether they are spam ones or not. A series of numerical experiments on a real dataset acquired from Taobao.com have confirmed the effectiveness of the proposed model, not only reaping benefits in terms of time efficiency and high accuracy but also overcoming the shortcomings of supervised learning methods, which depend on large amounts of labeled examples. And a trade-off balance was obtained between accuracy and efficiency.

Highlights

With the rapid growth of online product reviews, many users refer to others’ opinions before deciding to purchase any product. This fact has promoted the constant use of fake reviews, resulting in many wrong purchase decisions. e effective identification of deceptive reviews becomes a crucial yet challenging task in this research field. e existing supervised learning methods require a large number of labeled examples of deceptive and truthful opinions by domain experts, while the available unsupervised learning methods are inefficient because they depend on the features of reviewers to detect each fake review. erefore, by focusing on the detection efficiency problem and the limitation of large amount of labeled examples dependence, in this paper, we proposed an effective semisupervised learning approach for detecting spam reviews
A series of numerical experiments on a real dataset acquired from Taobao. com have confirmed the effectiveness of the proposed model, reaping benefits in terms of time efficiency and high accuracy and overcoming the shortcomings of supervised learning methods, which depend on large amounts of labeled examples
In terms of accuracy, compared with the algorithm proposed by Atefeh, the hit rate of our model increased by 29%, the F1 measure increased by 18.8%, and the precision increased by 24.6%. e proposed model in this study is a semisupervised learning method of collaborative training in which different classification models are used to train datasets multiple times based on different features and combine ensemble learning to vote. e algorithm proposed by Athefeh is an unsupervised identification method in which the features of fake reviews are not fully mined and the setting of a threshold value is uncertain

Summary

Introduction

With the rapid growth of online product reviews, many users refer to others’ opinions before deciding to purchase any product. E existing supervised learning methods require a large number of labeled examples of deceptive and truthful opinions by domain experts, while the available unsupervised learning methods are inefficient because they depend on the features of reviewers to detect each fake review. Erefore, by focusing on the detection efficiency problem and the limitation of large amount of labeled examples dependence, in this paper, we proposed an effective semisupervised learning approach for detecting spam reviews. Driven by competition and vested interests, many vendors and retailers try to manipulate online reviews They tend to post deceptive reviews in an attempt to mislead potential consumers and make them take risky purchasing decisions. It is difficult to quickly identify each fake review by relying on the features of reviewers

Objectives

Results

Conclusion