Study of Multi-Class Classification Algorithms’ Performance on Highly Imbalanced Network Intrusion Datasets

Viktoras Bulavas,Jacek Rumiński,Virginijus Marcinkevičius

doi:10.15388/21-infor457

Abstract

This paper is devoted to the problem of class imbalance in machine learning, focusing on the intrusion detection of rare classes in computer networks. The problem of class imbalance occurs when one class heavily outnumbers examples from the other classes. In this paper, we are particularly interested in classifiers, as pattern recognition and anomaly detection could be solved as a classification problem. As still a major part of data network traffic of any organization network is benign, and malignant traffic is rare, researchers therefore have to deal with a class imbalance problem. Substantial research has been undertaken in order to identify these methods or data features that allow to accurately identify these attacks. But the usual tactic to deal with the imbalance class problem is to label all malignant traffic as one class and then solve the binary classification problem. In this paper, however, we choose not to group or to drop rare classes but instead investigate what could be done in order to achieve good multi-class classification efficiency. Rare class records were up-sampled using SMOTE method (Chawla et al., 2002) to a preset ratio targets. Experiments with the 3 network traffic datasets, namely CIC-IDS2017, CSE-CIC-IDS2018 (Sharafaldin et al., 2018) and LITNET-2020 (Damasevicius et al., 2020) were performed aiming to achieve reliable recognition of rare malignant classes available in these datasets. Popular machine learning algorithms were chosen for comparison of their readiness to support rare class detection. Related algorithm hyper parameters were tuned within a wide range of values, different data feature selection methods were used and tests were executed with and without over-sampling to test the multiple class problem classification performance of rare classes. Machine learning algorithms ranking based on Precision, Balanced Accuracy Score, G¯ , and prediction error Bias and Variance decomposition, show that decision tree ensembles (Adaboost, Random Forest Trees and Gradient Boosting Classifier) performed best on the network intrusion datasets used in this research.

Highlights

Detection of intrusions into networks, information systems or workstations, as well as detection of malware and unauthorized activities of individuals, have emerged into a global challenge
Multi-criteria scoring is cross-validated with an approach of testing through data previously unseen for the models
We have studied three highly imbalanced network intrusion datasets and proposed methodology steps, helping to achieve high classification results of rare classes which were validated through model error decomposition and 50% data hold-out strategy

Summary

Introduction

Detection of intrusions into networks, information systems or workstations, as well as detection of malware and unauthorized activities of individuals, have emerged into a global challenge. There are three methods of intrusion detection (Koch, 2011): known pattern recognition (signature-based), anomaly based detection, and a hybrid of the previous two. Pattern recognition or anomaly detection can be seen as classification problems. In network traffic the benign data is most often represented by a large number of examples, while malignant traffic appears extremely rarely or is an absolute rarity. This is known as the class imbalance problem and is a known obstacle to the induction of good classifiers by Machine Learning (ML) algorithms (Batista et al, 2004)

Methods

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Informatica	Publication Date: Jan 1, 2021
Citations: 11	License type: cc-by

R Discovery Prime

R Discovery Prime

Study of Multi-Class Classification Algorithms’ Performance on Highly Imbalanced Network Intrusion Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Informatica

Lead the way for us

Similar Papers

Particle swarm optimization–deep belief network–based rare class prediction model for highly class imbalance problem
Jae Kwon Kim ... Jong Sik Lee
Concurrency and Computation: Practice and Experience | VOL. 29
Jae Kwon Kim, et. al.Jae Kwon Kim ... Jong Sik Lee
23 Apr 2017
Concurrency and Computation: Practice and Experience | VOL. 29

Hybrid approach redefinition with progressive boosting for class imbalance problem
Hartono Hartono ... Erianto Ongko
Science in Information Technology Letters | VOL. 1
Hartono Hartono, et. al.Hartono Hartono ... Erianto Ongko
27 Apr 2020
Science in Information Technology Letters | VOL. 1

Machine Learning Models for Classifying Imbalanced Class Datasets Using Ensemble Learning
Aditya Yulis Kusdiyanto ... Yoga Pristyanto
-
Aditya Yulis Kusdiyanto, et. al.Aditya Yulis Kusdiyanto ... Yoga Pristyanto
08 Dec 2022
08 Dec 2022

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

-

01 Aug 2019
01 Aug 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Study of Multi-Class Classification Algorithms’ Performance on Highly Imbalanced Network Intrusion Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Informatica