An enhanced graph-based semi-supervised learning algorithm to detect fake users on Twitter

M Balaanand,S Karthik,R Varatharajan,N Karthikeyan,Gunasekaran Manogaran,C B Sivaparthipan

doi:10.1007/s11227-019-02948-w

Abstract

Over the the past decade, social networking services (SNS) have proliferated on the web. The nature of such sites makes identity deception easy, providing a fast means for creating and managing identities, and then connecting with and deceiving others. Fake users are those accounts specifically created for purposes such as stalking or abuse of another user, for slander, or for marketing. The current system for detecting deception depends on behavioral, non-behavioral and user-generated content (UGC) information gathered from users. Although these methods have high detection accuracy, they cannot be implemented in databases with massive volumes of data. To address this issue, this paper proposes an enhanced graph-based semi-supervised learning algorithm (EGSLA) to detect fake users from a large volume of Twitter data. The proposed method encompasses four modules: data collection, feature extraction, classification and decision making. Data collected from Twitter using Scrapy is utilized for the evaluation. The performance of the proposed algorithm is tested with existing game theory, k-nearest neighbor (KNN), support vector machine (SVM) and decision tree techniques. The results show that the proposed EGSLA algorithm achieves 90.3% accuracy in spotting fake users.

Full Text