Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data

Sikha S Bagui,Sakthivel Subramaniam,Dustin Mink,Subhash C Bagui

doi:10.3390/computers12100204

Abstract

Machine Learning is widely used in cybersecurity for detecting network intrusions. Though network attacks are increasing steadily, the percentage of such attacks to actual network traffic is significantly less. And here lies the problem in training Machine Learning models to enable them to detect and classify malicious attacks from routine traffic. The ratio of actual attacks to benign data is significantly high and as such forms highly imbalanced datasets. In this work, we address this issue using data resampling techniques. Though there are several oversampling and undersampling techniques available, how these oversampling and undersampling techniques are most effectively used is addressed in this paper. Two oversampling techniques, Borderline SMOTE and SVM-SMOTE, are used for oversampling minority data and random undersampling is used for undersampling majority data. Both the oversampling techniques use KNN after selecting a random minority sample point, hence the impact of varying KNN values on the performance of the oversampling technique is also analyzed. Random Forest is used for classification of the rare attacks. This work is done on a widely used cybersecurity dataset, UNSW-NB15, and the results show that 10% oversampling gives better results for both BMSOTE and SVM-SMOTE.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computers	Publication Date: Oct 11, 2023
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data

Abstract

Talk to us

Similar Papers

More From: Computers

Lead the way for us

Similar Papers

Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage
Jianxiang Tang ... Hongli Wan
BMC Medical Informatics and Decision Making | VOL. 22
Jianxiang Tang, et. al.Jianxiang Tang ... Hongli Wan
25 Oct 2022
BMC Medical Informatics and Decision Making | VOL. 22

An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset
Prasetyo Wibowo ... Chastine Fatichah
Register: Jurnal Ilmiah Teknologi Sistem Informasi | VOL. 7
Prasetyo Wibowo, et. al.Prasetyo Wibowo ... Chastine Fatichah
28 Feb 2021
Register: Jurnal Ilmiah Teknologi Sistem Informasi | VOL. 7

Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering
Muhammad Mujahid ... Imran Ashraf
Journal of Big Data | VOL. 11
Muhammad Mujahid, et. al.Muhammad Mujahid ... Imran Ashraf
17 Jun 2024
Journal of Big Data | VOL. 11

3D Mineral Prospectivity Mapping from 3D Geological Models Using Return–Risk Analysis and Machine Learning on Imbalance Data
Qingming Peng ... Zhongzheng Wang
Minerals | VOL. 13
Qingming Peng, et. al.Qingming Peng ... Zhongzheng Wang
29 Oct 2023
Minerals | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data

Abstract

Talk to us

Similar Papers

More From: Computers