Unsupervised Machine Learning for Bot Detection on Twitter: Generating and Selecting Features for Accurate Clustering

Raad Al-Azawi,Safaa O Al-Mamory

doi:10.4114/intartif.vol27iss73pp142-158

Abstract

Twitter is a popular social media platform that is widely used by individuals and businesses. However, it is vulnerable to bot attacks, which can have negative effects on society. Supervised machine learning techniques can detect bots but require labeled data to differentiate between human and bot users. Twitter generates a significant amount of unlabeled data, which can be expensive to label. Unsupervised machine learning techniques, specifically clustering algorithms, are crucial for managing this data and reducing computational complexity. Effective feature selection is necessary for clustering, as some features are more important than others. This study aims to enhance feature reliability, introduce new features, and reduce them to improve bot identification accuracy using clustering algorithms. The study achieved an accuracy rate of 0.99 in four clustering algorithms, including agglomerative hierarchy, k-medoids, DBSCAN, and K-means. This was accomplished by minimizing dataset dimensions and selecting essential features. By employing unsupervised machine learning techniques, Twitter can detect and mitigate bot attacks more efficiently, which can positively impact society

Full Text