Abstract

With the rapid development of large-scale complex networks and proliferation of various social network applications, the amount of network traffic data generated is increasing tremendously, and efficient anomaly detection on those massive network traffic data is crucial to many network applications, such as malware detection, load balancing, network intrusion detection. Although there are many methods around for network traffic anomaly detection, they are all designed for single machine, failing to deal with the case that the network traffic data are so large that it is prohibitive for a single computer to store and process the data. To solve these problems, we propose a parallel algorithm based on Isolation Forest and Spark for network traffic anomaly detection. We combine the advantages of Isolation Forest algorithm in network traffic anomaly detection and big data processing capability of Spark technology. Meanwhile, we apply the idea of parallelization to the process of modeling and evaluation. In the calculation process, by assigning tasks to multiple compute nodes, Isolation Forest and Spark can efficiently perform anomaly detection and evaluation process. By this way, we can also solve the problem of computation bottleneck on single machine. Extensive experiments on real world datasets show that our Isolation Forest and Spark is efficient and scales well for anomaly detection on large network traffic data.

Highlights

  • Due to the development of new applications such as social networks, location based service, video sharing, the scale of Internet continues to expand

  • To solve the above problems, we propose a parallel algorithm for network traffic anomaly detection based on Isolation Forest and Spark (SPIF)

  • To solve the problem of computation bottleneck on single machine, we propose a parallel model for network traffic anomaly detection based on Isolation Forest

Read more

Summary

Introduction

Due to the development of new applications such as social networks, location based service, video sharing, the scale of Internet continues to expand. Network anomaly detection was originally proposed by Denning,[4] which refers to filtering out abnormal information from traffic data, identifying and diagnosing the security status of the network so as to ensure proper functioning of the network. In order to process large-scale network traffic data, it is necessary to realize the parallelization of the Isolation Forest algorithm. To solve the above problems, we propose a parallel algorithm for network traffic anomaly detection based on Isolation Forest and Spark (SPIF). We use parallel strategies to evaluate multiple data simultaneously, which improves the efficiency of abnormal evaluation. We propose our parallel algorithm for network traffic anomaly detection based on SPIF in section ‘‘Proposed model.’’ In section ‘‘Experimental study,’’ we briefly introduce the experimental environment and data we used, and the experimental results are analyzed in detail. Conclusions are drawn and further work is given in section ‘‘Conclusion.’’

Related work
Running speed
Versatility
Run everywhere
Experimental study
GB 8 GB 8 GB 8 GB
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call