Analysis of Machine Learning Algorithms with Feature Selection for Intrusion Detection using UNSW-NB15 Dataset

Geeta Kocher,Gulshan Kumar

doi:10.5121/ijnsa.2021.13102

Abstract

In recent times, various machine learning classifiers are used to improve network intrusion detection. The researchers have proposed many solutions for intrusion detection in the literature. The machine learning classifiers are trained on older datasets for intrusion detection, which limits their detection accuracy. So, there is a need to train the machine learning classifiers on the latest dataset. In this paper, UNSW-NB15, the latest dataset is used to train machine learning classifiers. The selected classifiers such as K-Nearest Neighbors (KNN), Stochastic Gradient Descent (SGD), Random Forest (RF), Logistic Regression (LR), and Naïve Bayes (NB) classifiers are used for training from the taxonomy of classifiers based on lazy and eager learners. In this paper, Chi-Square, a filter-based feature selection technique, is applied to the UNSW-NB15 dataset to reduce the irrelevant and redundant features. The performance of classifiers is measured in terms of Accuracy, Mean Squared Error (MSE), Precision, Recall, F1-Score, True Positive Rate (TPR) and False Positive Rate (FPR) with or without feature selection technique and comparative analysis of these machine learning classifiers is carried out.

Highlights

Nowadays, it is challenging to protect confidential data from the eye of attackers
It is found in the literature that researchers have put a lot of efforts into ML algorithms, and some of their contributions are described below: Narudin et al (2014) [5] described an evaluation of ML classifiers, namely Random Forest (RF), J-48, MLP, Naïve Bayes (NB), and K-Nearest Neighbors (KNN), to detect mobile malware using MalGenome and private datasets using Weka Tool
The simulated results on UNSWNB15 dataset showed that the proposed method achieved 98.6% Detection Rate (DR), 98.9% accuracy and 0.13% False Positive Rate (FPR)

Summary

Introduction

It is challenging to protect confidential data from the eye of attackers. The traditional methods like firewall and antivirus failed to handle all types of attacks. It is found in the literature that researchers have put a lot of efforts into ML algorithms, and some of their contributions are described below: Narudin et al (2014) [5] described an evaluation of ML classifiers, namely RF, J-48, MLP, NB, and KNN, to detect mobile malware using MalGenome and private datasets using Weka Tool. The performance metrics such as TPR, FPR, precision, recall and F-measure were used to validate ML algorithms' performance. There is a need for classifiers that can be used for the multiclass classification

Objectives

Methods

Findings

Conclusion