Abstract
Optimizing the detection of intrusions is becoming more crucial due to the continuously rising rates and ferocity of cyber threats and attacks. One of the popular methods to optimize the accuracy of intrusion detection systems (IDSs) is by employing machine learning (ML) techniques. However, there are many factors that affect the accuracy of the ML-based IDSs. One of these factors is noise, which can be in the form of mislabelled instances, outliers, or extreme values. Determining the extent effect of noise helps to design and build more robust ML-based IDSs. This paper empirically examines the extent effect of noise on the accuracy of the ML-based IDSs by conducting a wide set of different experiments. The used ML algorithms are decision tree (DT), random forest (RF), support vector machine (SVM), artificial neural networks (ANNs), and Naïve Bayes (NB). In addition, the experiments are conducted on two widely used intrusion datasets, which are NSL-KDD and UNSW-NB15. Moreover, the paper also investigates the use of these ML algorithms as base classifiers with two ensembles of classifiers learning methods, which are bagging and boosting. The detailed results and findings are illustrated and discussed in this paper.
Highlights
intrusion detection systems (IDSs) are a form of technical security controls that can be used to detect different forms of intrusions, malicious patterns, probing attempts, and unauthorized activities
All machine learning (ML) algorithms were trained on the same two-thirds of the datasets and all were tested on the same one-third that was dedicated for testing
The classification accuracy of the MLbased IDSs is prone to several factors
Summary
Contains the full testing set A subset of the KDDTest+, without records of difficulty level 21 out of 21. Many values of normal and malicious instances are almost the same in the UNSW-NB15 dataset, whereas there is a relatively reasonable difference between the normal and the malicious values in both KDD CUP 99 and NSL-KDD. The data distribution of the UNSW-NB15 dataset is nearly the same, while it is different in the KDD CUP 99 and NSL-KDD due to the existence of new attacks in the testing set, which helps to differentiate between normal and abnormal instances when running ML algorithms [8]. E fourth set of experiments entails conducting noise filtering by excluding noisy instances and injecting different levels of noise, which are 5%, 10%, 20%, and 30%. Is will help to study the influence of noise on the ML algorithms that run on intrusion datasets with the absence of outlier and extreme value instances
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.