Abstract
Intrusion detection systems are used to detect attacks in a network. Machine learning (ML) approaches have been widely used to build such Intrusion Detection Systems (IDSs) because they are more accurate when built from a very large and representative dataset. In recent times, one of the benchmark datasets that are used to build ML-based Intrusion Detection models is CICIDS2017 dataset. The dataset is contained in eight groups and was collected from the Canadian Institute on Cyber Security dataset repository. The dataset is available in both PCAP and netflow formats. This study used the netflow records in the CIDIDS2017 dataset as they are found to contain newer attacks,very large and are found useful for traffic analysis. Exploratory Data Analysis (EDA) techniques were used to reveal various characteristics of the dataset. The general objective is to provide more insights on the nature, structure and issues with the dataset so as to identify the best ways for using it to achieve improved ML-based IDS models. Furthermore, some of the open problems that can arise from the use of the dataset in any machine learning-based Intrusion Detection systems are highlighted and possible solutions are briefly discussed. The EDA techniques used revealed important relationships among input variables and the target class. The study concluded that the EDA can better influence the decision of future IDS researches that use the dataset. Thus, improved machine learning-based intrusion detection systems can be built from the dataset once it is well understood and pre-processed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.