Abstract

The explosive growth in network traffic in recent times has resulted in increased processing pressure on network intrusion detection systems. In addition, there is a lack of reliable methods for preprocessing network traffic generated by benign applications that do not steal users’ data from their devices. To alleviate these problems, this study analyzed the differences between benign and malicious traffic produced by benign applications and malware, respectively. To fully express these differences, this study proposed a new set of statistical features for training a clustering model. Furthermore, to mine the communication channels generated by benign applications in batches, a semisupervised clustering method was adopted. Using a small number of labeled samples, our method aggregated historical network traffic into two types of clusters. The cluster that did not contain labeled malicious samples was regarded as a benign traffic cluster. The experimental results were compared using four types of clustering algorithms. The density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm was selected to mine benign communication channels. We also compared our method with two other methods, and the results demonstrated that the benign channels mined through our method were more reliable. Finally, using our method, 1,811 benign transport layer security (TLS) channels were mined from 18,357 TLS communication channels. The number of flows carried by these benign channels comprised 65.37% of the entire network flows, and no malicious flow was included in our results, which proves the effectiveness of our method.

Highlights

  • Most of the communications making up internet traffic are generated by benign applications

  • Our network traffic preprocessing method can be realized by excluding the benign traffic contained in a cluster. erefore, the contributions of this study are as follows: (1) is study proposes a new network traffic preprocessing method based on a semisupervised model. To distinguish it from the traditional single flow-based features, this study presents a new set of statistical features for building a clustering model based on the transport layer security (TLS) communication channels

  • We evaluated the performance of the feature subsets by calculating two indicators of the labeled samples: the false positive rate (FPR) and the true positive rate (TPR). e evaluation algorithm is outlined in Algorithm 2

Read more

Summary

Introduction

Most of the communications making up internet traffic are generated by benign applications. A similar process can be seen whereby the coarse classification model of the first layer is mainly used to exclude the benign traffic, the second layer is used to classify the different types of malicious traffic, and the third layer is used to identify different malware families [6] In this process, the preprocessing method of the first layer is not described in detail, and its impacts on the detection results are rarely evaluated. Is study sought to find the difference between the benign application and the malware on the TLS communication channel and propose a preprocessing method to exclude the benign TLS traffic Applying this method to the NIDS can significantly reduce the processing pressure on the detection system

Benign Traffic Characteristics
Feature Representation
Methods
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call