Detecting spamming activities in twitter based on deep‐learning technique

Tingmin Wu,Sheng Wen,Mohammad Mehedi Hassan,Yang Xiang,Jun Zhang,Shigang Liu,Majed Alrubaian

doi:10.1002/cpe.4209

Abstract

SummaryTwitter spam has long been a critical but difficult problem to be addressed. So far, researchers have developed a series of machine learning–based methods and blacklisting techniques to detect spamming activities on Twitter. According to our investigation, current methods and techniques have achieved the accuracy of around 87%. However, because of the problems of spam drift and information fabrication, these machine learning–based methods cannot efficiently detect spam activities in real‐life scenarios. Meanwhile, the blacklisting method also cannot catch up with the variations of spamming activities, as manually inspecting suspicious URLs is extremely timeconsuming. In this paper, we proposed a novel technique based on deep‐learning technique to address the above challenges. The syntax of each tweet will be learned through WordVector and trained by deep learning. We then constructed a binary classifier to differentiate spam and regular tweets. In experiments, we collected and labeled a 10‐day real tweet dataset as ground truth to evaluate our proposed method. We first went for empirical analysis with a series of comparisons to other methods: (1) performance of different classifiers, (2) other existing text‐based methods, and (3) nontext‐based detection techniques. According to the experiment results, our proposed method largely outperformed previous methods. We further conducted principle component analysis on typical methods to theoretically justify the outperformance of our method. We extracted all kinds of features via dimensionality reduction. It was found that our features were most distinct among all the detection methods. This well demonstrated the outperformance of our method.

Full Text