The focus on privacy protection has brought much-encrypted network traffic. However, attackers always abuse traffic encryption to conceal malicious behaviors. Although researchers have proposed several enlightening detection methods, they must enhance the generalization ability or improve detection performance. Our inspiration is that the packet header fields, as do the underlying grammatical rules for constructing sentences, have a strict order. We consider the original packet as text and devise a robust approach with natural language processing and a deep learning model to improve the generalization ability and detection performance. We capture the critical keywords as characteristic representations of the traffic and design an adaptive domain generalization algorithm with a new loss function. It is robust against various datasets by generating more malicious samples to augment the minority of malicious samples. Simultaneously, we design an efficient feature selection algorithm, which obtains an optimal feature subset and reduces feature dimensions by 75.3%. To evaluate our work, we conducted extensive experiments with open-source datasets (CICIDS 2017, CICDDoS 2019, and USTC-TFC 2016), the synthetic dataset from IoT-23, and Internet backbone traffic (CERNET). Experimental results demonstrate that our proposal improves detection accuracy by up to 22.8% compared to others not using domain generalization algorithms and achieves an average detection latency of 0.67 s in the backbone. Besides, our work applies to the Industrial Internet of Things (IIoT) environment. It can be deployed at edge nodes to provide network security support for IIoT devices.
Read full abstract