Abstract

The effectiveness of deep-learning-based malicious traffic detection systems relies on high-quality labeled traffic datasets. However, malicious traffic labeling approaches can easily lead to incorrect labeling, which can have a harmful impact on models. To this end, various methods for learning with noise labels have been proposed. They exclude suspected wrong samples from model updates to ensure accuracy. However, this also removes hard samples, resulting in poor model decision boundaries and the loss of ability to classify hard samples. In this paper, we propose a boundary-augmentation-based approach for malicious traffic identification named BoAu. Unlike other approaches, BoAu treats all samples, including hard samples, equally during training to construct more accurate decision boundaries and thus improve accuracy. Meanwhile, a decision boundary augmentation module is designed to mitigate the impact of mislabeled hard samples on decision boundary generation. The decision boundary augmentation module adaptively adjusts the losses of hard samples based on their distance from the cluster to which their labels belong and other clusters, thus driving the shared feature representation network to fit the true label distribution. We validated BoAu in identifying malicious traffic with noise labels on a dataset covering 22 classes of realistic encrypted malicious traffic. Experimental results showed that even under scenarios with up to 90% noise labels, the classification accuracy was still over 80%, which was better than the state-of-the-art approaches. In addition, we validated the applicability of BoAu on several public datasets, including CIC-IDS-2017 and IoT-23.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call