Analysis on Network Traffic Features for Designing Machine Learning based IDS

N Meemongkolkiat,V Suttichaya

doi:10.1088/1742-6596/1993/1/012029

N Meemongkolkiat, V Suttichaya

Open Access

https://doi.org/10.1088/1742-6596/1993/1/012029

Copy DOI

Journal: Journal of Physics: Conference Series	Publication Date: Aug 1, 2021
Citations: 4	License type: cc-by

Affiliation: Mahidol University

Abstract

An intrusion detection system (IDS) is the most important technology for securing network systems. It can dynamically monitor network traffic for malicious activities that are aimed to violate confidentiality, integrity, authenticity, and availability of the network. Currently, several Machine Learning (ML) techniques are used to design and implement IDS since ML techniques can capture the complex nature of cyberattacks. However, network traffic information usually contains unimportant features that can deteriorate the efficacy of ML-based IDS. This research analyses the critical features in network traffic to be used for design/implementing the effective ML-based IDS. The selected features are applied to different ML methods to test the effectiveness. This research is conducted on the CICIDS2017 dataset generated by the Canadian Institute of Cybersecurity, using 30 percent of the full datasets and 100 percent of the Wednesday set. The best result achieved for 30 percent of the full set is by using 30 chosen features with the Bagging ensemble classifier giving the accuracy of 99.9 percent with the false-positive rate as low as 0.03 percent. The best result achieved for Wednesday set is by using the Random Forest Classifier which achieves an accuracy of 99.9 percent and a false-positive rate (FPR) of 0.02 percent.

Full Text