Abstract

X bots pose a significant issue in the social media landscape, with many shared links originating from bot-like accounts. This study introduces the application of the Isolation Forest algorithm, aimed explicitly at identifying anomalies such as bots by analyzing X account details. This study utilizes a dataset that merges data from Botometer with supplementary metrics like ‘average tweets per day’ and ‘account age in days’, contributed by David Martín Gutiérrez. This approach was adopted due to the increasing difficulties accessing the X API. The dataset comprises 37,438 instances, with 25,013 labeled human accounts and 12,425 labeled bot accounts. Pre-processing is performed to remove irrelevant features, and the dataset is split into Training, Validation, and Test sets in a 70:15:15 ratio. The training set undergoes hyperparameter and threshold tuning to identify the best configuration for this specific dataset (n_estimators: 50, contamination: 0.5, bootstrap: True), achieving a training set F1-score of 0.211001. Despite these optimization efforts, the Isolation Forest model's performance remains relatively low. The Test set evaluation yields modest precision, recall, and F1-score values (0.1801, 0.2795, and 0.2190, respectively), with a ROC AUC score of 0.3272. While the Isolation Forest algorithm shows promise in detecting X bots, its performance on this specific dataset is limited. Isolation Forest may not be the most suitable algorithm for this particular bot detection task on this dataset. Future work will explore techniques to enhance the performance of bot detection for a more comprehensive analysis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.