Abstract

This paper focuses on an important research problem of cyberspace security. As an active defense technology, intrusion detection plays an important role in the field of network security. Traditional intrusion detection technologies have problems such as low accuracy, low detection efficiency, and time consuming. The shallow structure of machine learning has been unable to respond in time. To solve these problems, the deep learning-based method has been studied to improve intrusion detection. The advantage of deep learning is that it has a strong learning ability for features and can handle very complex data. Therefore, we propose a deep random forest-based network intrusion detection model. The first stage uses a slide window to segment original features into many small pieces and then trains a random forest to generate the concatenated class vector as rerepresentation. The vector will be used to train the multilevel cascade parallel random forest in the second stage. Finally, the classification of the original data is determined by voting strategy after the last layer of cascade. Meanwhile, the model is deployed in Spark environment and optimizes cache replacement strategy of RDDs by efficiency sorting and partition integrity check. The experiment results indicate that the proposed method can effectively detect anomaly network behaviors, with high F1-measure scores and high accuracy. The results also show that it can cut down the average execution time on different scaled clusters.

Highlights

  • A Deep Random Forest Model on Spark for Network Intrusion DetectionReceived 12 October 2020; Revised 6 December 2020; Accepted 11 December 2020; Published 22 December 2020

  • Academic Editor: Salvatore Carta is paper focuses on an important research problem of cyberspace security

  • Most conventional methods are mainly limited with unacceptable accuracy in detection when network data are often complex and high-dimensional, such as back propagation (BP), support vector machine (SVM), and random forest (RF) [9]; the accuracy of the UNSW-NB15 dataset does not exceed 91% [10]. at reveals that the shallow structure of machine learning has been vulnerable to respond

Read more

Summary

A Deep Random Forest Model on Spark for Network Intrusion Detection

Received 12 October 2020; Revised 6 December 2020; Accepted 11 December 2020; Published 22 December 2020. E shallow structure of machine learning has been unable to respond in time To solve these problems, the deep learning-based method has been studied to improve intrusion detection. Ensemble learning is an important approach of machine learning, and random forest is one of the classic algorithms in ensemble learning It fits to high-dimensional data and has only a few parameters, and the training of the RF is not complicated. In order to reduce the obstacles caused by the numerous parameters of the present deep learning-based method in intrusion detection and to further improve the classification accuracy and scalability, this paper proposes a detection model based on feature segmentation and deep structure of parallelized random forest (FS-DPRF). (1) A deep cascade structure of random forest is proposed, and each layer is parallelized to improve the accuracy and scalability and to fit for massive data in detection task.

Related Work
Result
Parallelization on Spark
Experiment
Findings
Conclusion and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call