Abstract

With Community Question Answering (CQA) sites evolving into quite popular knowledge-sharing platforms on the Internet, they have also become ideal places for various spammers to spread fake or promotional information. Recently, with the rapid development of crowdsourcing systems, numerous malicious users have launched organized spam campaigns, conducting many spam accounts to carry out collusive spamming activities on CQA sites. In these campaigns, the spammers do not act independently but post deceptive questions and answers (Q&As) collaboratively, which makes the Q&As closely related to each other, but the spam clues of them are even less visible. Therefore, most existing spam detection works may fail to detect these carefully organized and posted collusive CQA spam. In this paper, taking Baidu Zhidao, a popular CQA platform in Chinese, as the study object, we propose a Collective Classification framework for community Question Answering spam detection (CCQA), which collectively identifies the collusive CQA spam using Q&A features and the correlations among Q&As. First, we define the Deceptive Pattern of Q&As, based on which the real Q&A groups are extracted. Then, we extract several highly discriminative Q&A features from both individual and group levels, and propose several types of correlations, which correlate the Q&As that are more likely to have the same labels. After uniformly modeling the Q&As, features, and correlations in the Attributed Heterogeneous Information Network (AHIN), a semi-supervised collective classification algorithm is proposed to detect the collusive Q&A spam. Experimental results on a real-life dataset demonstrate that CCQA can accurately detect the collusive CQA spam, and outperform a number of competitive baselines.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call