Abstract

The detection and removal of malicious social bots in social networks has become an area of interest in industry and academia. The widely used bot detection method based on machine learning leads to an imbalance in the number of samples in different categories. Classifier bias leads to a low detection rate of minority samples. Therefore, we propose an improved conditional generative adversarial network (improved CGAN) to extend imbalanced data sets before applying training classifiers to improve the detection accuracy of social bots. To generate an auxiliary condition, we propose a modified clustering algorithm, namely, the Gaussian kernel density peak clustering algorithm (GKDPCA), which avoids the generation of data-augmentation noise and eliminates imbalances between and within social bot class distributions. Furthermore, we improve the CGAN convergence judgment condition by introducing the Wasserstein distance with a gradient penalty, which addresses the model collapse and gradient disappearance in the traditional CGAN. Three common oversampling algorithms are compared in experiments. The effects of the imbalance degree and the expansion ratio of the original data on oversampling are studied, and the improved CGAN performs better than the others. Experimental results comparing with three common oversampling algorithms show that the improved CGAN achieves the higher evaluation scores in terms of F1-score, G-mean and AUC.

Highlights

  • In recent years, online social networks (OSNs), in which people can conveniently share and promote news, information, opinions, links, and products, have grown widely

  • In this work, we propose a data-augmentation approach to address the imbalance in Twitter bot detection by adopting conditional generative adversarial networks (CGANs)

  • Based on the clustering algorithm, in conjunction with using a CGAN to rebalance skewed data sets, we propose an effective imbalanced data oversampling method that avoids the generation of data-augmentation noise and effectively overcomes the imbalances between and within class distribution

Read more

Summary

INTRODUCTION

Online social networks (OSNs), in which people can conveniently share and promote news, information, opinions, links, and products, have grown widely. The data augmentation algorithm is an important method for solving the problem of data-set imbalance faced by the oversampling technique, and it has been applied to fields including computer vision, scene reconstruction, voice data augmentation, and natural language processing [12] To overcome these difficulties, in this work, we propose a data-augmentation approach to address the imbalance in Twitter bot detection by adopting conditional generative adversarial networks (CGANs). This model can accurately be used to judge the category of a social media account These investigators proposed a method based on synthetic minority oversampling to enhance the existing data set and generated a minority sample to improve the classification performance. Benchaji et al [69] proposed a sampling method based on the k-means clustering and the genetic algorithm to improve classification of imbalanced data sets for credit card fraud detection. The classifier facilitates more effective detection of social bots through a balanced data set

OVERALL PROCESS
EXPERIMENTAL PROCESS AND PARAMETER SETTING
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call