Investigating the Effect of Traffic Sampling on Machine Learning-Based Network Intrusion Detection Approaches

Jumabek Alikhanov,David Mohaisen,Daehun Nyang,Rhongho Jang,Youngtae Noh,Mohammed Abuhamad

doi:10.1109/access.2021.3137318

Abstract

Machine Learning (ML) based Network Intrusion Systems (NIDSs) operate on flow features which are obtained from flow exporting protocols (i.e., NetFlow). Recent success of ML and Deep Learning (DL) based NIDS solutions assume such flow information (e.g., avg. packet size) is obtained from all packets of the flow. However, often in practice flow exporter is deployed on commodity devices where packet sampling is inevitable. As a result, applicability of such ML based NIDS solutions in the presence of sampling (i.e., when flow information is obtained from sampled set of packets instead of full traffic) is an open question. In this study, we explore the impact of packet sampling on the performance and efficiency of ML-based NIDSs. Unlike previous work, our proposed evaluation procedure is immune to different settings of flow export stage. Hence, it can provide a robust evaluation of NIDS even in the presence of sampling. Through sampling experiments we established that malicious flows with shorter size (i.e., number of packets) are likely to go unnoticed even with mild sampling rates such as 1/10 and 1/100. Next, using the proposed evaluation procedure we investigated the impact of various sampling techniques on NIDS detection rate and false alarm rate. Detection rate and false alarm rate is computed for three sampling rates (i.e., 1/10, 1/100, 1/1000), for four different sampling techniques and for three (two tree-based, one deep learning based) classifiers. Experimental results show that systematic linear sampler - SketFlow performs better compared to non-linear samplers such as Sketch Guided and Fast Filtered sampling. We also found that random forest classifier with SketchFlow sampling was a better combination. The combination showed higher detection rate and lower false alarm rate across multiple sampling rates compared to other sampler-classifier combinations. Our results are consistent in multiple sampling rates, exceptional case is observed for Sketch Guided Sampling (SGS) as it caused a drastic performance drop when sampling rate was changed from 1/100 to 1/1000. Our results provide valuable insights for network practitioners and researchers regarding on how packet sampling effects ML-based NIDS performance. In this regard full source code for sampling and ML experiments has been released: github.com/Jumabek/sampledFlowMeter and github.com/Jumabek/nids-with-sampling

Highlights

Network monitoring applications such as flow analysis, intrusion detection, and performance monitoring have become increasingly popular owing to the continuous increase in the speed and volume of network traffic [1]
Evaluation framework for flow-level Machine Learning (ML)-based Network Intrusion Detection Systems (NIDSs): (1) in contrast to findings in previous literature [14], the time to train a convolutional neural network (CNN) is reduced by 3× using larger batch sizes without sacrificing the performance, (2) a 42% higher detection rate (DR) is achieved by addressing the training-data imbalance, and (3) flow level evaluation framework is proposed that is reliable even when the number of extracted flow records varies owing to configurations of the flow metering & export stage
This study focuses on Misuse Detection (MD) based NIDS that operates using flow information with the emphasis on the effects of traffic sampling

Summary

INTRODUCTION

Network monitoring applications such as flow analysis, intrusion detection, and performance monitoring have become increasingly popular owing to the continuous increase in the speed and volume of network traffic [1]. Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS traffic even flow record export has become a challenge for commodity devices (i.e., switch, router). This is because processing each packet requires a certain bandwidth, memory, and CPU cycles of the measuring device. Recent research has proposed a large body of machine learning (ML) and deep learning (DL) solutions for flow-based NIDSs [6]–[13]. These solutions have demonstrated promising results in terms of their robust detection rates (DRs).

RELATED WORK

NIDS IN THE PRESENCE OF SAMPLING

FLOW-LEVEL ML-BASED NIDS

Method

EXPERIMENTAL SETTING

CLASSIFIERS

EXPERIMENTS

FLOW VISIBILITY

FLOW-BASED NIDS ON SAMPLED DATA

Findings

CONCLUSION AND FUTURE PERSPECTIVES

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2022
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Investigating the Effect of Traffic Sampling on Machine Learning-Based Network Intrusion Detection Approaches

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Network Intrusion Detection Approach using Machine Learning Based on Decision Tree Algorithm
Elmadena Hassan ... Mohammed Saleh
Journal of Engineering and Applied Sciences | VOL. 7
Elmadena Hassan, et. al.Elmadena Hassan ... Mohammed Saleh
01 Jan 2020
Journal of Engineering and Applied Sciences | VOL. 7

Security of Things Intrusion Detection System for Smart Healthcare
Celestine Iwendi ... Joseph Henry Anajemba
Electronics | VOL. 10
Celestine Iwendi, et. al.Celestine Iwendi ... Joseph Henry Anajemba
08 Jun 2021
Electronics | VOL. 10

Research on detection method of abnormal capital transfer in electronic commerce based on machine learning
Guiming Zhu
International Journal of Information and Communication Technology | VOL. 17
Guiming ZhuGuiming Zhu
01 Jan 2020
International Journal of Information and Communication Technology | VOL. 17

An efficient approach for Intrusion Detection using data mining methods
Kapil Wankhade ... Sadia Patka
-
Kapil Wankhade, et. al.Kapil Wankhade ... Sadia Patka
01 Aug 2013
01 Aug 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Investigating the Effect of Traffic Sampling on Machine Learning-Based Network Intrusion Detection Approaches

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions