Models versus Datasets: Reducing Bias through Building a Comprehensive IDS Benchmark

Rasheed Ahmad,Lo’Ai Tawalbeh,Wasim Alhamdani,Izzat Alsmadi

doi:10.3390/fi13120318

Abstract

Today, deep learning approaches are widely used to build Intrusion Detection Systems for securing IoT environments. However, the models’ hidden and complex nature raises various concerns, such as trusting the model output and understanding why the model made certain decisions. Researchers generally publish their proposed model’s settings and performance results based on a specific dataset and a classification model but do not report the proposed model’s output and findings. Similarly, many researchers suggest an IDS solution by focusing only on a single benchmark dataset and classifier. Such solutions are prone to generating inaccurate and biased results. This paper overcomes these limitations in previous work by analyzing various benchmark datasets and various individual and hybrid deep learning classifiers towards finding the best IDS solution for IoT that is efficient, lightweight, and comprehensive in detecting network anomalies. We also showed the model’s localized predictions and analyzed the top contributing features impacting the global performance of deep learning models. This paper aims to extract the aggregate knowledge from various datasets and classifiers and analyze the commonalities to avoid any possible bias in results and increase the trust and transparency of deep learning models. We believe this paper’s findings will help future researchers build a comprehensive IDS based on well-performing classifiers and utilize the aggregated knowledge and the minimum set of significantly contributing features.

Highlights

Growing consumer, business, and industrial demand for advanced Internet of Things (IoT) solutions creates unique challenges to securing these devices
Thousands or possibly millions of IoT devices can be controlled by a command and control (C&C)
This paper explores the output of various DL models by implementing SHapley Additive Explanation (SHAP) and Local Interpretable Model-Agnostic Explanation (LIME), analyzing predictions, finding commonalities to avoid bias, improving classifier quality and reliability, and extracting top contributing features that influenced the model predictions most

Summary

Introduction

Business, and industrial demand for advanced Internet of Things (IoT) solutions creates unique challenges to securing these devices. Regarding data quality and reliability, many recent IoT IDS research studies have been proposed based on very old benchmark datasets such as KDD CUP 99 [7,8], or NSL-KDD [9,10,11]. These datasets lack the modern day’s network traffic patterns and the various current attack information [5,12,13].

Literature Review

Related Work

Benchmark Datasets

Dataset Quality and Reliability Issues

Deep Learning Classifiers for Sequential Data

Feature Importance

Proposed Framework

Individual Prediction Interpretation—Localized Explanation

Model Interpretation—Global Explanation

Top Contributing Features

Findings

Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future Internet	Publication Date: Dec 17, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Models versus Datasets: Reducing Bias through Building a Comprehensive IDS Benchmark

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Internet

Lead the way for us

Similar Papers

Extraction of Family History Information From Clinical Notes: Deep Learning and Heuristics Approach
João Figueira Silva ... Sérgio Matos
JMIR Medical Informatics | VOL. 8
João Figueira Silva, et. al.João Figueira Silva ... Sérgio Matos
29 Dec 2020
JMIR Medical Informatics | VOL. 8

COMPARISON OF VARIOUS METRICS FOR EVALUATING THE PERFORMANCE OF DEEP LEARNING BINARY CLASSIFICATION, PARTICULARLY WHEN UNDERLYING IMAGING DATA ARE IMBALANCED
S Liu ... C.K Kwoh
Osteoarthritis Imaging | VOL. 2
S Liu, et. al.S Liu ... C.K Kwoh
01 Jan 2021
Osteoarthritis Imaging | VOL. 2

Abusive language detection from social media comments using conventional machine learning and deep learning approaches
Muhammad Pervez Akhter ... Mohammed Abdelmajeed
Multimedia Systems | VOL. 28
Muhammad Pervez Akhter, et. al.Muhammad Pervez Akhter ... Mohammed Abdelmajeed
01 Apr 2021
Multimedia Systems | VOL. 28

Deep video-based person re-identification (Deep Vid-ReID): comprehensive survey
Rana S M Saad ... Mona M Moussa
EURASIP Journal on Advances in Signal Processing | VOL. 2024
Rana S M Saad, et. al.Rana S M Saad ... Mona M Moussa
15 May 2024
EURASIP Journal on Advances in Signal Processing | VOL. 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Models versus Datasets: Reducing Bias through Building a Comprehensive IDS Benchmark

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Internet