Abstract
Anomaly detection in network traffic is becoming a challenging task due to the complexity of large-scale networks and the proliferation of various social network applications. In the actual industrial environment, only recently obtained unlabelled data can be used as the training set. The accuracy of the abnormal ratio in the training set as prior knowledge has a great influence on the performance of the commonly used unsupervised algorithms. In this study, an anomaly detection algorithm based on X-means and iForest is proposed, named X-iForest, which clusters the standard Euclidean distance between the abnormal points and the normal cluster centre to achieve secondary filtering by using X-means. We compared X-iForest with seven mainstream unsupervised algorithms in terms of the AUC and anomaly detection rates. A large number of experiments showed that X-iForest has notable advantages over other algorithms and can be well applied to anomaly detection of large-scale network traffic data.
Highlights
In recent years, the network environment has become increasingly complex
The complex network environment and the surge of traffic data make the detection of network traffic anomalies a considerable challenge facing enterprises today
In addition to the explosive growth of traffic data, the current unsupervised anomaly detection algorithms commonly used in industrial applications cannot be well implemented in a real complex network environment
Summary
Following extensive investigations of actual industrial applications and recently published articles in the field of network health analysis and network traffic anomaly detection, the main methods can be classified as follows. X-means and isolation forest based methodology for network traffic anomaly detection and proposed a new general formula for distance calculation and a PCA-based IoT detection method. They verified the feasibility of their proposed method through a variety of experiments. Distance-based approaches incur a very high computational cost for massive datasets, with loss of performance when applied to network traffic anomaly detection These approaches introduce the concept of LOF, in which each instance is assigned a score based on the neighbours’ local density denoting a degree of outlierness.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.