Abstract

This study investigated the impact of data quality on the performance of intrusion detection systems. Experiments were conducted using intrusion datasets and machine learning models. The pre-trained models were less affected by data duplications and overlaps compared to classic ML models. Removing overlaps and duplicates from training data improved the pre-trained models' performance in most cases, but had adverse effects in datasets with highly similar sequences. The study also proposed a framework for model selection and data quality assurance for building high- quality intrusion detection systems. Additionally, we focus on optimizing nine hyperparameters within a 1D-CNN model, using two well-established evolutionary computation methods—genetic algorithm(GA) and particle swarm optimization(PSO).The performances of these methods are assessed using three majordatasets—UNSW-NB15,CIC- IDS2017, and NSL-KDD. The key performance metrics considered in this study include the accuracy, loss, precision, recall, and F1-score. The results demonstrate considerable improvements in all metrics across all datasets, for both GA- and PSO-optimized models, when compared to those of the original non optimized 1D-CNN model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call