Abstract

Intrusion detection systems are used to detect attacks in a network.  Machine learning (ML) approaches have been widely used to build such Intrusion Detection Systems (IDSs) because they are more accurate when built from a very large and representative dataset. In recent times, one of the benchmark datasets that are used to build ML-based Intrusion Detection models is CICIDS2017 dataset. The dataset is contained in eight groups and was collected from the Canadian Institute on Cyber Security dataset repository.  The dataset is available in both PCAP and netflow formats. This study used the netflow records in the CIDIDS2017 dataset as they are found to contain newer attacks,very large and are found useful for traffic analysis. Exploratory Data Analysis (EDA) techniques were used to reveal various characteristics of the dataset. The general objective is to provide more insights on the nature, structure and issues with the dataset so as  to identify the best ways for using it to achieve improved ML-based IDS models. Furthermore, some of the open problems that can arise from the use of the dataset in any machine learning-based Intrusion Detection systems are highlighted and possible solutions are briefly discussed. The EDA techniques used revealed important relationships among input variables and the target class. The study concluded that the EDA can better influence the decision of future IDS researches that use the dataset. Thus, improved machine learning-based intrusion detection systems can be built from the dataset once it is well understood and pre-processed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call