A Survey on Data-Driven Learning for Intelligent Network Intrusion Detection Systems

Ghada Abdelmoumin,Jessica Whitaker,Abdul Rahman,Danda B Rawat

doi:10.3390/electronics11020213

Abstract

An effective anomaly-based intelligent IDS (AN-Intel-IDS) must detect both known and unknown attacks. Hence, there is a need to train AN-Intel-IDS using dynamically generated, real-time data in an adversarial setting. Unfortunately, the public datasets available to train AN-Intel-IDS are ineluctably static, unrealistic, and prone to obsolescence. Further, the need to protect private data and conceal sensitive data features has limited data sharing, thus encouraging the use of synthetic data for training predictive and intrusion detection models. However, synthetic data can be unrealistic and potentially bias. On the other hand, real-time data are realistic and current; however, it is inherently imbalanced due to the uneven distribution of anomalous and non-anomalous examples. In general, non-anomalous or normal examples are more frequent than anomalous or attack examples, thus leading to skewed distribution. While imbalanced data are commonly predominant in intrusion detection applications, it can lead to inaccurate predictions and degraded performance. Furthermore, the lack of real-time data produces potentially biased models that are less effective in predicting unknown attacks. Therefore, training AN-Intel-IDS using imbalanced and adversarial learning is instrumental to their efficacy and high performance. This paper investigates imbalanced learning and adversarial learning for training AN-Intel-IDS using a qualitative study. It surveys and synthesizes generative-based data augmentation techniques for addressing the uneven data distribution and generative-based adversarial techniques for generating synthetic yet realistic data in an adversarial setting using rapid review, structured reporting, and subgroup analysis.

Highlights

In a binary classification problem, such as anomaly-based detection, where the dataset contains two sets of examples, it is common to encounter class imbalance
The authors used the balanced data generated by the generative adversarial networks (GAN), which solved the problem of overfitting and overlapping by specifying the desired resampling rate, to train an anomaly-based detection model based on the random forest (RF) method by increasing the weight of the minority attack class in the intrusion detection evaluation dataset (CICIDS)
Our initial focus was to categorize the surveyed data-driven learning (DDL) methods and techniques into data augmentation and data generation based on the class of problem they are attempting to solve and into adversarial and non-adversarial learning based on their learning approach

Summary

Introduction

In a binary classification problem, such as anomaly-based detection, where the dataset contains two sets of examples (normal and anomalous), it is common to encounter class imbalance. Class imbalance generally occurs when the normal set contains significantly more examples or samples than the anomalous set, dividing the dataset into minority and majority class samples. Data imbalance or uneven class distribution can cause an AN-Intel-IDS model to over classify the normal class due to its high probability in the dataset compared to the anomalous one. Resampling techniques, which are often applicable before learning, adjust the minority class distribution to solve the data imbalance problem. Oversampling and undersampling techniques focus on balancing the distribution of the majority and minority classes in the dataset. While oversampling and undersampling reduce data imbalance using the same dataset, SMOTE, an intelligent data resampling technique, reduces the degree of imbalance by synthetically creating a new minority class [13]. Overfitting is less significant in SMOTE than overlapping, which results from interpolating between relatively adjacent instances of the minority class

Objectives

Methods

Findings

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Jan 11, 2022
Citations: 11	License type: CC BY 4.0

R Discovery Prime

A Survey on Data-Driven Learning for Intelligent Network Intrusion Detection Systems

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Synthetic Data for Video Surveillance Applications of Computer Vision: A Review
Rita Delussu ... Giorgio Fumera
International Journal of Computer Vision | VOL. 132
Rita Delussu, et. al.Rita Delussu ... Giorgio Fumera
17 May 2024
International Journal of Computer Vision | VOL. 132

A case for synthetic data in regulatory decision-making in Europe.
Clara Alloza ... Bethany Knox
Clinical Pharmacology & Therapeutics | VOL. 114
Clara Alloza, et. al.Clara Alloza ... Bethany Knox
24 Aug 2023
Clinical Pharmacology & Therapeutics | VOL. 114

Intrusion detection based on concept drift detection and online incremental learning
Farah Jemili ... Ouajdi Korbaa
International Journal of Pervasive Computing and Communications | VOL. -
Farah Jemili, et. al.Farah Jemili ... Ouajdi Korbaa
23 Oct 2024
International Journal of Pervasive Computing and Communications | VOL. -

A Deep Learning Model for Network Intrusion Detection with Imbalanced Data
Yanfang Fu ... Zijian Cao
Electronics | VOL. 11
Yanfang Fu, et. al.Yanfang Fu ... Zijian Cao
14 Mar 2022
Electronics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A Survey on Data-Driven Learning for Intelligent Network Intrusion Detection Systems

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Electronics