Distributed and scalable Sybil identification based on nearest neighbour approximation using big data analysis techniques

Chinnaiah Valliyammai,Ramalingam Devakunchari

doi:10.1007/s10586-018-2314-9

Abstract

The problem of Sybil detection has been examined in multiple social media sources like Twitter, LinkedIn and Facebook. The detection of Sybils (fake accounts or social bots) across online social networks emerged as a major challenge due to the current improvement of different social networks, which are promptly generating a very huge data sets termed as big data. The open-source framework, spark-based distributed, fast and scalable nearest neighbor search (S-DFS-NNS) is proposed for profile-based fake account detection across large-scale online social networks. The proposed work performs an efficient parallel processing of the NN search problem. The performance of the k-nearest neighbor (k-NN) search significantly degrades for huge data sets, because the job is computationally hard. The framework is fast and adaptable to expansive, large-scale situations. By using in-memory computation, the suspected users are identified based on the novel private feature. The Spark-DFS-NN search technique provides a substantial performance development over the nearest neighbor computation in large-scale networks. The proposed framework is evaluated using detection accuracy which is able to expose and block a large fraction of suspicious accounts during account creation. The proposed S-DFS-NN framework maintains an approximately consistent and similar performance of 89–95% on the increase of attacks with a latency of 58 ms.

Full Text