Word Embedding for Social Bot Detection Systems

Zineb Ellaky,Faouzia Benabbou,Sara Ouahabi,Nawal Sael

doi:10.1109/icds53782.2021.9626752

Abstract

In recent years, the growth of online social network (OSN) has been very phenomenal with great social and economic impact. However, some accounts are created for malicious activities whose objective is to influence elections, sully reputations, spread fake information or attack legitimate users. Practitioners as well as researchers are attracted by this problem and attempt to propose some solutions to prevent any OSN malicious activity. Fake account can be managed by bots, but all bots are not malicious systematically. In this paper we study the impact of using word embedding with different Machine Learning (ML) techniques on the of social bot detection performance. The experiments are based only on comment features from Cresci-17 dataset. For this purpose, we used the most ML algorithms such as Logistic Regression (LR), Decision Tree (DT), k-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest (RF), Naive Bayes (NB),Adaboost, XGBoost and MultiLayer Perceptron (MLP) with different word embedding methods such as BOW, TF-IDF, Doc2Vec, Bert, Word2Vec, and FastText. The results showed that RF and DT algorithms performed the highest precision of 99.96% with Bert, and Doc2Vec gave performed a precision score of100%.

Full Text