Detecting Spammer Groups From Product Reviews: A Partially Supervised Learning Model

Lu Zhang,Jie Cao,Zhiang Wu

doi:10.1109/access.2017.2784370

Lu Zhang, Jie Cao + Show 1 more

Open Access

https://doi.org/10.1109/access.2017.2784370

Copy DOI

Abstract

Nowadays, online product reviews play a crucial role in the purchase decision of consumers. A high proportion of positive reviews will bring substantial sales growth, while negative reviews will cause sales loss. Driven by the immense financial profits, many spammers try to promote their products or demote their competitors’ products by posting fake and biased online reviews. By registering a number of accounts or releasing tasks in crowdsourcing platforms, many individual spammers could be organized as spammer groups to manipulate the product reviews together and can be more damaging. Existing works on spammer group detection extract spammer group candidates from review data and identify the real spammer groups using unsupervised spamicity ranking methods. Actually, according to the previous research, labeling a small number of spammer groups is easier than one assumes, however, few methods try to make good use of these important labeled data. In this paper, we propose a partially supervised learning model (PSGD) to detect spammer groups. By labeling some spammer groups as positive instances, PSGD applies positive unlabeled learning (PU-Learning) to study a classifier as spammer group detector from positive instances (labeled spammer groups) and unlabeled instances (unlabeled groups). Specifically, we extract reliable negative set in terms of the positive instances and the distinctive features. By combining the positive instances, extracted negative instances and unlabeled instances, we convert the PU-Learning problem into the well-known semi-supervised learning problem, and then use a Naive Bayesian model and an EM algorithm to train a classifier for spammer group detection. Experiments on real-life Amazon.cn data set show that the proposed PSGD is effective and outperforms the state-of-the-art spammer group detection methods.

Full Text