Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset

Sydney M Kasongo,Yanxia Sun

doi:10.1186/s40537-020-00379-6

Sydney M Kasongo, Yanxia Sun

Open Access

https://doi.org/10.1186/s40537-020-00379-6

Copy DOI

Journal: Journal of Big Data	Publication Date: Nov 25, 2020
Citations: 215	License type: open-access

Affiliation: University of Johannesburg

Abstract

Computer networks intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) are critical aspects that contribute to the success of an organization. Over the past years, IDSs and IPSs using different approaches have been developed and implemented to ensure that computer networks within enterprises are secure, reliable and available. In this paper, we focus on IDSs that are built using machine learning (ML) techniques. IDSs based on ML methods are effective and accurate in detecting networks attacks. However, the performance of these systems decreases for high dimensional data spaces. Therefore, it is crucial to implement an appropriate feature extraction method that can prune some of the features that do not possess a great impact in the classification process. Moreover, many of the ML based IDSs suffer from an increase in false positive rate and a low detection accuracy when the models are trained on highly imbalanced datasets. In this paper, we present an analysis the UNSW-NB15 intrusion detection dataset that will be used for training and testing our models. Moreover, we apply a filter-based feature reduction technique using the XGBoost algorithm. We then implement the following ML approaches using the reduced feature space: Support Vector Machine (SVM), k-Nearest-Neighbour (kNN), Logistic Regression (LR), Artificial Neural Network (ANN) and Decision Tree (DT). In our experiments, we considered both the binary and multiclass classification configurations. The results demonstrated that the XGBoost-based feature selection method allows for methods such as the DT to increase its test accuracy from 88.13 to 90.85% for the binary classification scheme.

Highlights

The rapid pace at which technologies such as the Internet, Internet-of-Things (IoT)and communication systems is advancing has caused hackers to evolve with a higher velocity in terms of their capabilities
Performance Metrics There exist an number of metrics to evaluate machine learning (ML) based Intrusion Detection Systems (IDSs) systems; this research aims to maximize the correct predictions of instances in the test dataset
Experiments and Results Our experimental design was categorized in two main processes whereby we considered the following ML techniques: Artificial Neural Network (ANN), Logistic Regression (LR), kNN, Support Vector Machine (SVM) and Decision Tree (DT)

Summary

Introduction

Communication systems is advancing has caused hackers to evolve with a higher velocity in terms of their capabilities. These criminals strive to discover new ways to compromise computer networks security. IDSs are categorized in 3 broad classes, namely, node or host based IDS (HIDS) (Fig. 1), network or distributed IDS (DIDS or NIDS) (Fig. 2) and Hybrid IDS (HYIDS) (Fig. 3). This classification is based on the specific IDS operation philosophy. HYIDS is a combination of DIDS and HIDS [2, 3]

Objectives

Methods

Results

Conclusion