Detecting cybersecurity attacks across different network features and learners

Joffrey L Leevy,Taghi M Khoshgoftaar,John Hancock,Richard Zuech

doi:10.1186/s40537-021-00426-w

Joffrey L Leevy, Taghi M Khoshgoftaar + Show 2 more

Open Access

https://doi.org/10.1186/s40537-021-00426-w

Copy DOI

Journal: Journal of Big Data	Publication Date: Feb 23, 2021
Citations: 35	License type: open-access

Affiliation: Florida Atlantic University

Abstract

Machine learning algorithms efficiently trained on intrusion detection datasets can detect network traffic capable of jeopardizing an information system. In this study, we use the CSE-CIC-IDS2018 dataset to investigate ensemble feature selection on the performance of seven classifiers. CSE-CIC-IDS2018 is big data (about 16,000,000 instances), publicly available, modern, and covers a wide range of realistic attack types. Our contribution is centered around answers to three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) and F1-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM and Catboost in terms of AUC and F1-score?” The third question is, “Does the choice of classifier: Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Catboost, LightGBM, or XGBoost, significantly impact performance in terms of AUC and F1-score?” These research questions are all answered in the affirmative and provide valuable, practical information for the development of an efficient intrusion detection model. To the best of our knowledge, we are the first to use an ensemble feature selection technique with the CSE-CIC-IDS2018 dataset.

Highlights

CSE-CIC-IDS2018 [1], referred to as the 2018 dataset throughout this text, is an intrusion detection dataset with normal and anomalous instances of network traffic
Our contribution is defined by our responses to three research questions: The first question is, “Does feature selection impact performance of classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) and F1-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM and Catboost in terms of AUC and F1-score?” And, our third question is, “Does the choice of classifier: Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Catboost, LightGBM, or XGBoost, significantly impact performance in terms of AUC and F1-score?” The answers to these research questions provide valuable and practical information for the development of an efficient intrusion detection model
The first results we report are the mean AUC and F1-scores for CatBoost, LightGBM, and CategoricalNB with their respective datasets

Summary

Introduction

CSE-CIC-IDS2018 [1], referred to as the 2018 dataset throughout this text, is an intrusion detection dataset with normal and anomalous instances of network traffic. Machine learning models efficiently trained on CSE-CIC-IDS2018 can detect network traffic capable of compromising an information system. This dataset is the most recent iteration of ISCXIDS2012 [2], a scalable project designed to produce modern, realistic datasets. CSE-CIC-IDS2018 data originated from an extensive network of victim and attack machines [3], yielding an aggregate of 16,233,002 instances. Six classes of attack traffic (percentage distribution shown in Table 1) are represented by about 17% of these instances. The dataset is distributed over ten CSV files that are

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Detecting cybersecurity attacks across different network features and learners

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

Detecting Cybersecurity Attacks Using Different Network Features with LightGBM and XGBoost Learners
Joffrey L. Leevy ... John Hancock
-
Joffrey L. Leevy, et. al.Joffrey L. Leevy ... John Hancock
01 Oct 2020
01 Oct 2020

IoT Reconnaissance Attack Classification with Random Undersampling and Ensemble Feature Selection
Joffrey L Leevy ... John Hancock
-
Joffrey L Leevy, et. al.Joffrey L Leevy ... John Hancock
01 Dec 2021
01 Dec 2021

Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches
Chih-Fong Tsai ... Ya-Ting Sung
Knowledge-Based Systems | VOL. 203
Chih-Fong Tsai, et. al.Chih-Fong Tsai ... Ya-Ting Sung
01 Jun 2020
Knowledge-Based Systems | VOL. 203

An Ensemble Feature Selection Method for Prediction of CKD
M Manonmani ... Sarojini Balakrishnan
-
M Manonmani, et. al.M Manonmani ... Sarojini Balakrishnan
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Detecting cybersecurity attacks across different network features and learners

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data