Abstract

An intrusion detection system (IDS) is an important protection instrument for detecting complex network attacks. Various machine learning (ML) or deep learning (DL) algorithms have been proposed for implementing anomaly-based IDS (AIDS). Our review of the AIDS literature identifies some issues in related work, including the randomness of the selected algorithms, parameters, and testing criteria, the application of old datasets, or shallow analyses and validation of the results. This paper comprehensively reviews previous studies on AIDS by using a set of criteria with different datasets and types of attacks to set benchmarking outcomes that can reveal the suitable AIDS algorithms, parameters, and testing criteria. Specifically, this paper applies 10 popular supervised and unsupervised ML algorithms for identifying effective and efficient ML-AIDS of networks and computers. These supervised ML algorithms include the artificial neural network (ANN), decision tree (DT), k-nearest neighbor (k-NN), naive Bayes (NB), random forest (RF), support vector machine (SVM), and convolutional neural network (CNN) algorithms, whereas the unsupervised ML algorithms include the expectation-maximization (EM), k-means, and self-organizing maps (SOM) algorithms. Several models of these algorithms are introduced, and the turning and training parameters of each algorithm are examined to achieve an optimal classifier evaluation. Unlike previous studies, this study evaluates the performance of AIDS by measuring the true positive and negative rates, accuracy, precision, recall, and F-Score of 31 ML-AIDS models. The training and testing time for ML-AIDS models are also considered in measuring their performance efficiency given that time complexity is an important factor in AIDSs. The ML-AIDS models are tested by using a recent and highly unbalanced multiclass CICIDS2017 dataset that involves real-world network attacks. In general, the k-NN-AIDS, DT-AIDS, and NB-AIDS models obtain the best results and show a greater capability in detecting web attacks compared with other models that demonstrate irregular and inferior results.

Highlights

  • As more platforms and applications are being connected to networks, data become increasingly vulnerable to malicious attacks

  • The machine learning (ML)-anomaly-based IDS (AIDS) algorithms are implemented by using Python3 in Anaconda 3 on a computer with OPTIPLEX 3010 Dell, Intel Core i3, 3.60 GHz processor, 4 GB primary memory, and 2 GB GPU functioning on Ubuntu 16.04

  • This study proposes a benchmarking approach that involves several steps and uses real data to ensure an effective evaluation of AIDS performance based on ML algorithms

Read more

Summary

Introduction

As more platforms and applications are being connected to networks, data become increasingly vulnerable to malicious attacks. Using an intrusion detection system (IDS) is a well-known approach for protecting computer networks. Two popular types of IDS, namely, network- (NIDS) and host-based IDS (HIDS), have been adopted in practice. NIDS monitors network traffic and detects any malicious activity in the network by analyzing the activities of end users [2]. IDS applies two types of detection methods, namely, signature- and anomaly-based methods. Signature-based IDS (or HIDS) detects attacks by identifying patterns (i.e., signatures) in IDS [3]. While this method can detect known malware and attacks based on their

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call