Abstract

Recent advances in the Internet and digital technology have brought a wide variety of activities into cyberspace, but they have also brought a surge in cyberattacks, making it more important than ever to detect and prevent cyberattacks. In this study, a method is proposed to detect anomalies in cyberspace by consolidating BGP (Border Gateway Protocol) data into numerical data that can be trained by machine learning (ML) through a tokenizer. BGP data comprise a mix of numeric and textual data, making it challenging for ML models to learn. To convert the data into a numerical format, a tokenizer, a preprocessing technique from Natural Language Processing (NLP), was employed. This process goes beyond merely replacing letters with numbers; its objective is to preserve the patterns and characteristics of the data. The Synthetic Minority Over-sampling Technique (SMOTE) was subsequently applied to address the issue of imbalanced data. Anomaly detection experiments were conducted on the model using various ML algorithms such as One-Class Support Vector Machine (One-SVM), Convolutional Neural Network–Long Short-Term Memory (CNN–LSTM), Random Forest (RF), and Autoencoder (AE), and excellent performance in detection was demonstrated. In experiments, it performed best with the AE model, with an F1-Score of 0.99. In terms of the Area Under the Receiver Operating Characteristic (AUROC) curve, good performance was achieved by all ML models, with an average of over 90%. Improved cybersecurity is expected to be contributed by this research, as it enables the detection and monitoring of cyber anomalies from malicious users through BGP data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call