Abstract
The lack of publicly available up-to-date datasets contributes to the difficulty in evaluating intrusion detection systems. This paper introduces HIKARI-2021, a dataset that contains encrypted synthetic attacks and benign traffic. This dataset conforms to two requirements: the content requirements, which focus on the produced dataset, and the process requirements, which focus on how the dataset is built. We compile these requirements to enable future dataset developments and we make the HIKARI-2021 dataset, along with the procedures to build it, available for the public.
Highlights
It is challenging to estimate how much malicious detection methods have improved in the intrusion detection system (IDS) field
Between 2017 and 2021, we found mixed methods from several published papers, such as [49,50], Rajagopal et al [49], who argued that conventional machine learning methods were ineffective and instead used stacking ensembles to improve performance and reliable predictions, while [50] proposed hybridized multi-model system to improve the accuracy of detecting the intrusion
Most of the features were adopted from CICIDS-2017, while uid, originh, originp, responh, responp, traffic_category, and Label were derived from Zeek
Summary
It is challenging to estimate how much malicious detection methods have improved in the intrusion detection system (IDS) field. Among the factors that make it difficult to compare datasets are a lack of proper documentation of the methods [1], a lack of comparison methodology [2], and a lack of important features, such as ground-truth labels, and publicly available and real-world environment traffic. Handling ground-truth is a real challenge, especially when experts cannot determine whether the traffic is an attack or benign This is a reason why researchers use synthetic traffic. We present a tool and requirements for making a new dataset created by generating encrypted network traffic in a real-world environment.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have