Blackmarket-Driven Collusion Among Retweeters–Analysis, Detection, and Characterization

Hridoy Sankar Dutta,Tanmoy Chakraborty

doi:10.1109/tifs.2019.2953331

Abstract

The growth of online social media has led to a huge increase in the number of users who want to share and publicize various kinds of information. Twitter, the most popular micro-blogging platform, has become a hotbed for users who are involved in different activities such as news publishing, job hunting, recruiting, advertising and publicity. Retweeting a tweet is a major action to broadcast a user’s message out to millions of users. Retweet action has two major advantages: (i) gaining quick exposure to the content, and (ii) increasing likelihood of gaining new Twitter followers in return. The organic way of gaining a larger number of retweets is a time consuming process, which leads to the creation of unfair methods to gain retweets. Thus, Twitter users often approach various blackmarket services to gain retweets inorganically in a short duration. Blackmarkets spread their collusive ecosystem in such a way that Twitter is unable to detect them even after devoting significant effort to purge the platform off bots, trolls, and fake accounts. One major reason behind the evasion is that the collusive users involved in blackmarket services exhibit a mix of organic and inorganic behavior – they organically reweet some genuine tweets; at the same time, they inorganically retweet tweets submitted to blackmarket services. This paper is the first attempt to provide a thorough study of the collusive users involved in two types of blackmarket services – Premium and Freemium. We collect a novel dataset of collusive users comprising of users from both types of blackmarket services. We provide network-centric, profile-centric, timeline-centric and retweet-centric characteristics of these users and show how users involved in premium blackmarket services exhibit diverse behavior as compared to those involved in freemium services. We further employ human annotators to label collusive users into three types: bots, promotional customers, and normal customers. We then curate 63 novel features to run state-of-the-art classifiers in two settings – binary classification (collusive vs. genuine) and multi-class classification (bot, promotional customers, normal customers, and genuine users). Bagging achieves the best accuracy (macro F1-score of 0.892) in the former setting, whereas Random Forest outperforms others (macro F1-score of 0.791) in the latter setting. We also develop a chrome extension, SCoRe++ which can detect collusive retweeters in real time.

Full Text