Bot identification: Helping analysts for right data in twitter

T Velayutham,Pradeep Kumar Tiwari

doi:10.1109/icaccaf.2017.8344722

Abstract

As social networks are becoming popular; it raises concerns among data analyzers for the quality of content over social media platforms. For better and fair predictive analysis, the quality of data is important. Low quality content may result into prediction of improper cause of an event, misleading trending issues and more importantly the sensitive stock price may fluctuate. The content over social media may be flooded or corrupted by various bots such as Influence Bots, Spam Bots. We are targeting twitter for the identification of such bots, as it is mostly used by data scientists for applications related to scientific prediction and sentiment analysis. In this paper, we capitalize on earlier approaches and used a machine learning based approach for the classification between a bot profile and human profile. We have identified 10 attributes of user profile and tweet pattern for an account and calculated a score called botScore for each profile to model the behavior as bot or as human. We have extended the list of features in distinguishing between bot and human to more fine-grained label. The method proposed was found to be more accurate than traditional Baye's classification technique.

Full Text