Abstract

As numerous Internet-of-Things (IoT) devices are deploying on a daily basis, network intrusion detection systems (NIDS) are among the most critical tools to ensure the protection and security of networks against malicious cyberattacks. This paper employs four machine learning algorithms: XGBoost, random forest, decision tree, and gradient boosting, and evaluates their performance in NIDS, considering the accuracy, precision, recall, and F-score. The comparative analysis conducted using the CICIDS2017 dataset reveals that the XGBoost performs better than the other algorithms reaching the predicted accuracy of 99.6% in detecting cyberattacks. XGBoost-based attack detectors also have the largest weighted metrics of F1-score, precision, and recall. The paper also studies the effect of class imbalance and the size of the normal and attack classes. The small numbers of some attacks in training datasets mislead the classifier to bias towards the majority classes resulting in a bottleneck to improving macro recall and macro F1 score. The results assist the network engineers in choosing the most effective machine learning-based NIDS to ensure network security for today’s growing IoT network traffic.Â

Highlights

  • Network intrusion detection systems play an important role in the dramatic growth of the Internet of Things (IoT) that exposes new vulnerabilities in the network

  • This paper evaluates the effect of such a class imbalance on the detection metrics in real network attacks and investigates the impact of training data size on the performance of the ML models

  • Figure 4. provides a visual comparison of how each of these models from each pipeline performs based on various metrics

Read more

Summary

Introduction

Network intrusion detection systems play an important role in the dramatic growth of the Internet of Things (IoT) that exposes new vulnerabilities in the network. Intrusion Detection Evaluation Dataset (CICIDS2017), available from the Canadian Institute for Cybersecurity, contains the most updated real-world attack scenarios in networks [21]. This dataset has ~600,000 benign data from normal traffic, while only 11 and 21 samples are recorded for two attack types on a specific day. It is critical to evaluate the performance of ML algorithms for different size of the training data for known attacks. This paper evaluates the effect of such a class imbalance on the detection metrics in real network attacks and investigates the impact of training data size on the performance of the ML models.

Training process
Results and discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.