A botnet is a group of hijacked devices that conduct various cyberattacks, which is one of the most dangerous threats on the internet. Organizations or individuals use network traffic to mine botnet communication behavior features. Network traffic often contains individual users’ private information, such as website passwords, personally identifiable information, and communication content. Among the existing botnet detection methods, whether they extract deterministic traffic interaction features, use DNS traffic, or methods based on raw traffic bytes, these methods focus on the detection performance of the detection model and ignore possible privacy leaks. And most methods are combined with machine learning and deep learning technologies, which require a large amount of training data to obtain high-precision detection models. Therefore, preventing malicious persons from stealing data to infer privacy during the botnet detection process has become an issue worth pondering. Based on this problem, this article proposes a privacy-enhanced framework with deep learning for botnet detection. The goal of this framework is to learn a feature extractor. It can hide the private information that the attack model tries to infer from the intermediate anonymity features, while maximally retaining the interactive behavior features contained in the original traffic for botnet detection. We design a privacy confrontation algorithm based on a mutual information calculation mechanism. This algorithm simulates the game between the attacker trying to infer private information through the attack model and the data processor retaining the original content of the traffic to the maximum extent. In order to further ensure the privacy protection of the feature extractor during the training process, we train the feature extractor in the federated learning training mode. We extensively evaluate our approach, validating it on two public datasets and comparing it with existing methods. The results show that our method can effectively ensure detection accuracy on the basis of removing private information. For the CTU-13 dataset, the detection framework achieves the best detection performance; for the ISCX-2014 dataset, the accuracy of the framework is less than 1% lower than the best effect.
Read full abstract