Abstract

The intrusion detection system (IDS) is an essential part of cyber security which captures and investigates traffic to distinguish between legitimate and malicious activities and determines the type of attack. The selection of the dataset used in training the machine learning based IDS is crucial in ensuring that IDS performs accurately in cyber-attacks classification. When utilizing multiple datasets in the training process, the metrics will relate numerically between the ML algorithm and particular dataset. Previous research concluded a major decline in metrics when using inter-datasets evaluation. This research investigates thoroughly about the use of the most modern and comprehensive IDS datasets CIC-IDS-2017 and CSE-CIC-IDS2018, to design and evaluate machine learning based IDS system using hybrid CNN-LSTM architecture. The new approach followed is to generate a new dataset which is the output of mixing both datasets. The experimental testing showed a superior metrics values yielded when training with the mixture dataset against the use of individual datasets, especially when performing inter-datasets evaluation, which overcomes the generalization problem.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call