Abstract

With vast amounts of data being generated daily and the ever increasing interconnectivity of the world's internet infrastructures, a machine learning based Intrusion Detection Systems (IDS) has become a vital component to protect our economic and national security. Previous shallow learning and deep learning strategies adopt the single learning model approach for intrusion detection. The single learning model approach may experience problems to understand increasingly complicated data distribution of intrusion patterns. Particularly, the single deep learning model may not be effective to capture unique patterns from intrusive attacks having a small number of samples. In order to further enhance the performance of machine learning based IDS, we propose the Big Data based Hierarchical Deep Learning System (BDHDLS). BDHDLS utilizes behavioral features and content features to understand both network traffic characteristics and information stored in the payload. Each deep learning model in the BDHDLS concentrates its efforts to learn the unique data distribution in one cluster. This strategy can increase the detection rate of intrusive attacks as compared to the previous single learning model approaches. Based on parallel training strategy and big data techniques, the model construction time of BDHDLS is reduced substantially when multiple machines are deployed.

Highlights

  • With vast amounts of data being created every day and the ever increasing interconnectivity of the world’s internet infrastructures, more and more security vulnerabilities in these infrastructures are discovered by security experts every month[1]

  • In order to avoid potential weaknesses of traditional signature-based approaches, researchers have employed the shallow learning models having less than three computational layers such as Decision Tree (DT) and Support Vector Machine (SVM) for intrusion detection[2]

  • Construction of based Hierarchical Deep Learning System (BDHDLS) is divided into five phases: Phase 1: Generating behavioral features and content features using big data techniques; Phase 2: Partitioning the dataset into multiple one-level clusters using Spark based parallel improved K-means algorithm; Phase 3: Generating multi-level cluster trees in parallel; Phase 4: Building the deep learning model for each cluster; Phase 5: Merging decisions from deep learning models in different clusters to classify samples as intrusive or benign

Read more

Summary

Introduction

With vast amounts of data being created every day and the ever increasing interconnectivity of the world’s internet infrastructures, more and more security vulnerabilities in these infrastructures are discovered by security experts every month[1]. In order to evaluate different approaches for building the intrusion detection system, the performance of five computational models is compared: (1) a single Decision Tree (DT) built on the entire dataset; (2) a single Support Vector Machine (SVM) built on the entire dataset; (3) a single deep Convolutional Neural Network (CNN) built on the entire dataset; (4) a single model (RNNCNN) combining Recurrent Neural Network (RNN) and. Major contributions of this work include (1) utilization of big data techniques called Apache Spark for feature selection and clustering; (2) incorporation of both behavioral and content-based features simultaneously to improve prediction accuracy; and (3) adoption of multiple deep learning models in the hierarchical tree structure to learn unique traffic patterns for each intrusive attack family.

Related Work
BDHDLS for Intrusion Detection
Generation of behavioral features and content features
Generation of one-level clusters
Generation of hierarchical based cluster tree
Deep learning model training for each cluster
Decision fusion algorithm
Datasets for combined 5 2 cross validation F test and independent test
Performance evaluation metrics
Model configuration for different clusters in the subtree
Performance comparison of different feature sets
Independent test results for ISCX2012 dataset
Independent test results for CICIDS2017 dataset
Results for DARPA1998 dataset
Construction time for BDHDLS when different numbers of machines are used
Conclusion and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.