Increasing the Performance of Machine Learning-Based IDSs on an Imbalanced and Up-to-Date Dataset

Gozde Karatas,Ozgur Koray Sahingoz,Onder Demir

doi:10.1109/access.2020.2973219

Gozde Karatas, Ozgur Koray Sahingoz + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.2973219

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 174	License type: CC BY 4.0

Affiliation: Marmara University

Abstract

In recent years, due to the extensive use of the Internet, the number of networked computers has been increasing in our daily lives. Weaknesses of the servers enable hackers to intrude on computers by using not only known but also new attack-types, which are more sophisticated and harder to detect. To protect the computers from them, Intrusion Detection System (IDS), which is trained with some machine learning techniques by using a pre-collected dataset, is one of the most preferred protection mechanisms. The used datasets were collected during a limited period in some specific networks and generally don't contain up-to-date data. Additionally, they are imbalanced and cannot hold sufficient data for all types of attacks. These imbalanced and outdated datasets decrease the efficiency of current IDSs, especially for rarely encountered attack types. In this paper, we propose six machine-learning-based IDSs by using K Nearest Neighbor, Random Forest, Gradient Boosting, Adaboost, Decision Tree, and Linear Discriminant Analysis algorithms. To implement a more realistic IDS, an up-to-date security dataset, CSE-CIC-IDS2018, is used instead of older and mostly worked datasets. The selected dataset is also imbalanced. Therefore, to increase the efficiency of the system depending on attack types and to decrease missed intrusions and false alarms, the imbalance ratio is reduced by using a synthetic data generation model called Synthetic Minority Oversampling TEchnique (SMOTE). Data generation is performed for minor classes, and their numbers are increased to the average data size via this technique. Experimental results demonstrated that the proposed approach considerably increases the detection rate for rarely encountered intrusions.

Highlights

Due to technological developments, most of the real-world transactions have been made available in the cyber world
With the widespread use of smartphones, people can connect to this global network and perform transactions at any time and from anywhere
The results showed that the Decision Tree (DT) achieved a lower false positive rate and higher true positive rate than Support Vector Machine (SVM), as the DT has 99.86% and SVM has 99.62% true positive rate, and the DT has 0.05%, and SVM has 0.09%, false-negative rate

Summary

Introduction

Most of the real-world transactions have been made available in the cyber world. With the widespread use of smartphones, people can connect to this global network and perform transactions at any time and from anywhere. This digitalization facilitates the daily work of human beings, due to the weakness of the servers and the newly emerged intrusion techniques, networks are. Security administrators traditionally prefer password protection mechanisms, encryption techniques, and access controls in addition to firewalls as a means of protecting the network. These techniques are not sufficient for protecting the system.

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Increasing the Performance of Machine Learning-Based IDSs on an Imbalanced and Up-to-Date Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Machine learning model for diagnostic method prediction in parasitic disease using clinical information
You Won Lee ... Eun-Hee Shin
Expert Systems with Applications | VOL. 185
You Won Lee, et. al.You Won Lee ... Eun-Hee Shin
26 Jul 2021
Expert Systems with Applications | VOL. 185

IDCSNet: Intrusion Detection and Classification System using Unified Gradient-Boosted Decision Tree Classifier
Kondru Mounika ... P Venkateswara Rao
-
Kondru Mounika, et. al.Kondru Mounika ... P Venkateswara Rao
13 Dec 2022
13 Dec 2022

Applying machine learning methods to predict geology using soil sample geochemistry
Timothy C.C Lui ... Sharon A Cowling
Applied Computing and Geosciences | VOL. 16
Timothy C.C Lui, et. al.Timothy C.C Lui ... Sharon A Cowling
11 Aug 2022
Applied Computing and Geosciences | VOL. 16

A Systematic Analysis and Review on Intrusion Detection Systems Using Machine Learning and Deep Learning Algorithms
Sneha Leela Jacob ... Parveen Sultana Habibullah
Journal of Computational and Cognitive Engineering | VOL. -
Sneha Leela Jacob, et. al.Sneha Leela Jacob ... Parveen Sultana Habibullah
04 Jul 2024
Journal of Computational and Cognitive Engineering | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Increasing the Performance of Machine Learning-Based IDSs on an Imbalanced and Up-to-Date Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access