An empirical assessment of ML models for 5G network intrusion detection: A data leakage-free approach

Mohamed Aly Bouke,Azizol Abdullah

doi:10.1016/j.prime.2024.100590

Abstract

This paper thoroughly compares thirteen unique Machine Learning (ML) models utilized for Intrusion detection systems (IDS) in a meticulously controlled environment. Unlike previous studies, we introduce a novel approach that meticulously avoids data leakage, enhancing the reliability of our findings. The study draws upon a comprehensively labeled 5G-NIDD dataset covering a broad spectrum of network behaviors, from benign real-user traffic to various attack scenarios. Our data preprocessing and experimental design have been carefully structured to eradicate any data leakage, a standout feature of our methodology that significantly improves the robustness and dependability of our results compared to prior studies. The ML models are evaluated using various performance metrics, including accuracy, precision, recall, F1-score, ROC AUC, and execution time. Our results reveal that the K-Nearest Neighbors model is superior in accuracy and ROC AUC, while the Voting Classifier stands out in precision and F1-score. Decision Tree, Bagging, and Extra Trees models exhibit strong recall scores. In contrast, the AdaBoost model falls short across all assessed metrics. Despite displaying only modest performance on other metrics, the Naive Bayes model excels in computational efficiency, offering the quickest execution time. This paper emphasizes the importance of understanding various ML models' distinct strengths, drawbacks, and trade-offs for network intrusion detection. It highlights that no single model is universally superior, and the choice hinges on the nature of the dataset, specific application requirements, and the computational resources available.

Full Text