Applying big data based deep learning system to intrusion detection

Wei Zhong,Chunyu Ai,Ning Yu

doi:10.26599/bdma.2020.9020003

Abstract

With vast amounts of data being generated daily and the ever increasing interconnectivity of the world's internet infrastructures, a machine learning based Intrusion Detection Systems (IDS) has become a vital component to protect our economic and national security. Previous shallow learning and deep learning strategies adopt the single learning model approach for intrusion detection. The single learning model approach may experience problems to understand increasingly complicated data distribution of intrusion patterns. Particularly, the single deep learning model may not be effective to capture unique patterns from intrusive attacks having a small number of samples. In order to further enhance the performance of machine learning based IDS, we propose the Big Data based Hierarchical Deep Learning System (BDHDLS). BDHDLS utilizes behavioral features and content features to understand both network traffic characteristics and information stored in the payload. Each deep learning model in the BDHDLS concentrates its efforts to learn the unique data distribution in one cluster. This strategy can increase the detection rate of intrusive attacks as compared to the previous single learning model approaches. Based on parallel training strategy and big data techniques, the model construction time of BDHDLS is reduced substantially when multiple machines are deployed.

Highlights

With vast amounts of data being created every day and the ever increasing interconnectivity of the world’s internet infrastructures, more and more security vulnerabilities in these infrastructures are discovered by security experts every month[1]
In order to avoid potential weaknesses of traditional signature-based approaches, researchers have employed the shallow learning models having less than three computational layers such as Decision Tree (DT) and Support Vector Machine (SVM) for intrusion detection[2]
Construction of based Hierarchical Deep Learning System (BDHDLS) is divided into five phases: Phase 1: Generating behavioral features and content features using big data techniques; Phase 2: Partitioning the dataset into multiple one-level clusters using Spark based parallel improved K-means algorithm; Phase 3: Generating multi-level cluster trees in parallel; Phase 4: Building the deep learning model for each cluster; Phase 5: Merging decisions from deep learning models in different clusters to classify samples as intrusive or benign

Summary

Introduction

With vast amounts of data being created every day and the ever increasing interconnectivity of the world’s internet infrastructures, more and more security vulnerabilities in these infrastructures are discovered by security experts every month[1]. In order to evaluate different approaches for building the intrusion detection system, the performance of five computational models is compared: (1) a single Decision Tree (DT) built on the entire dataset; (2) a single Support Vector Machine (SVM) built on the entire dataset; (3) a single deep Convolutional Neural Network (CNN) built on the entire dataset; (4) a single model (RNNCNN) combining Recurrent Neural Network (RNN) and. Major contributions of this work include (1) utilization of big data techniques called Apache Spark for feature selection and clustering; (2) incorporation of both behavioral and content-based features simultaneously to improve prediction accuracy; and (3) adoption of multiple deep learning models in the hierarchical tree structure to learn unique traffic patterns for each intrusive attack family.

Related Work

BDHDLS for Intrusion Detection

Generation of behavioral features and content features

Generation of one-level clusters

Generation of hierarchical based cluster tree

Deep learning model training for each cluster

Decision fusion algorithm

Datasets for combined 5 2 cross validation F test and independent test

Performance evaluation metrics

Model configuration for different clusters in the subtree

Performance comparison of different feature sets

Independent test results for ISCX2012 dataset

Independent test results for CICIDS2017 dataset

Results for DARPA1998 dataset

Construction time for BDHDLS when different numbers of machines are used

Conclusion and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Big Data Mining and Analytics	Publication Date: Jul 16, 2020
Citations: 91	License type: cc-by

R Discovery Prime

R Discovery Prime

Applying big data based deep learning system to intrusion detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics

Lead the way for us

Similar Papers

Learning to Detect: A Data-driven Approach for Network Intrusion Detection
Zachary Tauscher ... Jian Wang
-
Zachary Tauscher, et. al.Zachary Tauscher ... Jian Wang
29 Oct 2021
29 Oct 2021

Binary Arithmetic Optimization Algorithm with Machine Learning based Intrusion Detection System
Et Al S P Senthilkumar,
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11
Et Al S P Senthilkumar,Et Al S P Senthilkumar,
30 Oct 2023
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11

Automatic Feature Extraction and Selection For Machine Learning Based Intrusion Detection
Jinjie Liu ... Sun Sunnie Chung
-
Jinjie Liu, et. al.Jinjie Liu ... Sun Sunnie Chung
01 Aug 2019
01 Aug 2019

Transfer Learning Based Intrusion Detection
Zahra Taghiyarrenani ... Ehsan Mahdavi
-
Zahra Taghiyarrenani, et. al.Zahra Taghiyarrenani ... Ehsan Mahdavi
01 Oct 2018
01 Oct 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Applying big data based deep learning system to intrusion detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics