A GAN and Feature Selection-Based Oversampling Technique for Intrusion Detection

Xiaodong Liu,Zhen Yang,Di Wu,Yongheng Liu,Tong Li,Runzi Zhang,Savio Sciancalepore

doi:10.1155/2021/9947059

Abstract

In recent years, there have been numerous cyber security issues that have caused considerable damage to the society. The development of efficient and reliable Intrusion Detection Systems (IDSs) is an effective countermeasure against the growing cyber threats. In modern high-bandwidth, large-scale network environments, traditional IDSs suffer from a high rate of missed and false alarms. Researchers have introduced machine learning techniques into intrusion detection with good results. However, due to the scarcity of attack data, such methods’ training sets are usually unbalanced, affecting the analysis performance. In this paper, we survey and analyze the design principles and shortcomings of existing oversampling methods. Based on the findings, we take the perspective of imbalance and high dimensionality of datasets in the field of intrusion detection and propose an oversampling technique based on Generative Adversarial Networks (GAN) and feature selection. Specifically, we model the complex high-dimensional distribution of attacks based on Gradient Penalty Wasserstein GAN (WGAN-GP) to generate additional attack samples. We then select a subset of features representing the entire dataset based on analysis of variance, ultimately generating a rebalanced low-dimensional dataset for machine learning training. To evaluate the effectiveness of our proposal, we conducted experiments based on the NSL-KDD, UNSW-NB15, and CICIDS-2017 datasets. The experimental results show that our method can effectively improve the detection performance of machine learning models and outperform the baselines.

Highlights

In recent years, there have been numerous cyber security issues that have caused considerable damage to the society. e development of efficient and reliable Intrusion Detection Systems (IDSs) is an effective countermeasure against the growing cyber threats
We take the perspective of imbalance and high dimensionality of datasets in the field of intrusion detection and propose an oversampling technique based on Generative Adversarial Networks (GAN) and feature selection
We can draw the same conclusion from the comparison in Figure 8. is is because Wasserstein GAN (WGAN)-GP can learn the distribution of the attack samples

Summary

Introduction

There have been numerous cyber security issues that have caused considerable damage to the society. e development of efficient and reliable Intrusion Detection Systems (IDSs) is an effective countermeasure against the growing cyber threats. E development of efficient and reliable Intrusion Detection Systems (IDSs) is an effective countermeasure against the growing cyber threats. We take the perspective of imbalance and high dimensionality of datasets in the field of intrusion detection and propose an oversampling technique based on Generative Adversarial Networks (GAN) and feature selection. Intrusion Detection Systems (IDSs) have been widely adopted as an effective method to detect and defend against network attacks in response to the growing network threats. It monitors network traffic in real-time, divides network records into normal records and malicious records, and provides essential information for the defense system. Due to the sparsity of attack data, the training set for this type of approach is unbalanced, affecting analysis performance [2]

Methods

Results

Conclusion