Machine Learning Algorithms and Datasets for Modern IDS Design

Inam Abdullah Abdulmajeed,Idress Mohammed Husien

doi:10.1109/cyberneticscom55287.2022.9865255

Abstract

Intrusion Detection System (IDS) is a critical component in cyber security to capture and analyze the traffic and then differentiate between benignant and malicious traffic indicating the attack type. This review is aimed to investigate various Machine Learning (ML) algorithms utilized in IDS design; with particular focus on dataset used. The parameters used to compare the performance of each algorithm have been studied also. Dataset choice is exceptionally critical to guarantee that it is matching the IDS requirements. The dataset structure can influence in a great manner the selection of the of ML algorithm. Hence, metric will provide a numerical relation between ML algorithm against specific dataset. This review concluded that researches are liberating themselves from Supervised Learning and moving toward Clustering and other algorithms, which gives the hope that IDS in the future will be able to detect more unknown and zero-day attacks, also the percentage of utilizing hybrid algorithms has increased dramatically. On the other hand, recent ML researchers are depending more and more on modern datasets which contributes as a significant consideration in IDS design although some research articles are still seeing the KDDCup99 and its reduced variant as principal training dataset of IDSs, despite the fact that it is more than 20 years old, while cyber-threats keep rising together with adapting new technologies in the cyber world like cloud computing, IoT, and IPv6.

Full Text