Abstract

Machine Learning (ML) based Network Intrusion Systems (NIDSs) operate on flow features which are obtained from flow exporting protocols (<i>i.e.,</i> NetFlow). Recent success of ML and Deep Learning (DL) based NIDS solutions assume such flow information (<i>e.g.,</i> avg. packet size) is obtained from all packets of the flow. However, often in practice flow exporter is deployed on commodity devices where packet sampling is inevitable. As a result, applicability of such ML based NIDS solutions in the presence of sampling (<i>i.e.,</i> when flow information is obtained from sampled set of packets instead of full traffic) is an open question. In this study, we explore the impact of packet sampling on the performance and efficiency of ML-based NIDSs. Unlike previous work, our proposed evaluation procedure is immune to different settings of flow export stage. Hence, it can provide a robust evaluation of NIDS even in the presence of sampling. Through sampling experiments we established that malicious flows with shorter size (<i>i.e.,</i> number of packets) are likely to go unnoticed even with mild sampling rates such as 1/10 and 1/100. Next, using the proposed evaluation procedure we investigated the impact of various sampling techniques on NIDS detection rate and false alarm rate. Detection rate and false alarm rate is computed for three sampling rates (<i>i.e.,</i> 1/10, 1/100, 1/1000), for four different sampling techniques and for three (two tree-based, one deep learning based) classifiers. Experimental results show that systematic linear sampler - SketFlow performs better compared to non-linear samplers such as Sketch Guided and Fast Filtered sampling. We also found that random forest classifier with SketchFlow sampling was a better combination. The combination showed higher detection rate and lower false alarm rate across multiple sampling rates compared to other sampler-classifier combinations. Our results are consistent in multiple sampling rates, exceptional case is observed for Sketch Guided Sampling (SGS) as it caused a drastic performance drop when sampling rate was changed from 1/100 to 1/1000. Our results provide valuable insights for network practitioners and researchers regarding on how packet sampling effects ML-based NIDS performance. In this regard full source code for sampling and ML experiments has been released: github.com/Jumabek/sampledFlowMeter and github.com/Jumabek/nids-with-sampling

Highlights

  • Network monitoring applications such as flow analysis, intrusion detection, and performance monitoring have become increasingly popular owing to the continuous increase in the speed and volume of network traffic [1]

  • Evaluation framework for flow-level Machine Learning (ML)-based Network Intrusion Detection Systems (NIDSs): (1) in contrast to findings in previous literature [14], the time to train a convolutional neural network (CNN) is reduced by 3× using larger batch sizes without sacrificing the performance, (2) a 42% higher detection rate (DR) is achieved by addressing the training-data imbalance, and (3) flow level evaluation framework is proposed that is reliable even when the number of extracted flow records varies owing to configurations of the flow metering & export stage

  • This study focuses on Misuse Detection (MD) based NIDS that operates using flow information with the emphasis on the effects of traffic sampling

Read more

Summary

INTRODUCTION

Network monitoring applications such as flow analysis, intrusion detection, and performance monitoring have become increasingly popular owing to the continuous increase in the speed and volume of network traffic [1]. Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS traffic even flow record export has become a challenge for commodity devices (i.e., switch, router). This is because processing each packet requires a certain bandwidth, memory, and CPU cycles of the measuring device. Recent research has proposed a large body of machine learning (ML) and deep learning (DL) solutions for flow-based NIDSs [6]–[13]. These solutions have demonstrated promising results in terms of their robust detection rates (DRs).

RELATED WORK
NIDS IN THE PRESENCE OF SAMPLING
FLOW-LEVEL ML-BASED NIDS
Method
EXPERIMENTAL SETTING
CLASSIFIERS
EXPERIMENTS
FLOW VISIBILITY
FLOW-BASED NIDS ON SAMPLED DATA
Findings
CONCLUSION AND FUTURE PERSPECTIVES
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call