BGP Dataset-Based Malicious User Activity Detection Using Machine Learning

Dongkyoo Shin,Dongil Shin,Kookjin Kim,Hansol Park

doi:10.3390/info14090501

Abstract

Recent advances in the Internet and digital technology have brought a wide variety of activities into cyberspace, but they have also brought a surge in cyberattacks, making it more important than ever to detect and prevent cyberattacks. In this study, a method is proposed to detect anomalies in cyberspace by consolidating BGP (Border Gateway Protocol) data into numerical data that can be trained by machine learning (ML) through a tokenizer. BGP data comprise a mix of numeric and textual data, making it challenging for ML models to learn. To convert the data into a numerical format, a tokenizer, a preprocessing technique from Natural Language Processing (NLP), was employed. This process goes beyond merely replacing letters with numbers; its objective is to preserve the patterns and characteristics of the data. The Synthetic Minority Over-sampling Technique (SMOTE) was subsequently applied to address the issue of imbalanced data. Anomaly detection experiments were conducted on the model using various ML algorithms such as One-Class Support Vector Machine (One-SVM), Convolutional Neural Network–Long Short-Term Memory (CNN–LSTM), Random Forest (RF), and Autoencoder (AE), and excellent performance in detection was demonstrated. In experiments, it performed best with the AE model, with an F1-Score of 0.99. In terms of the Area Under the Receiver Operating Characteristic (AUROC) curve, good performance was achieved by all ML models, with an average of over 90%. Improved cybersecurity is expected to be contributed by this research, as it enables the detection and monitoring of cyber anomalies from malicious users through BGP data.

Full Text