As the importance of personal data privacy increases, traffic encryption has become an important topic in network communication. In the field of network security and management, the development of encrypted traffic classification technology has drawn attention to deep learning methods. For raw encrypted traffic, deep learning models can realize end-to-end classification with high accuracy. However, deep learning methods do not explain which part of the encrypted traffic is critical to classification, and that will limit their application in some cyber security scenarios that demand high interpretability. The approach proposed in this paper features a novel feature engineering method named “BITization” to determine the encoding method of features and a sliding window technique to explore which bytes contribute the most to the classification. The accuracy of classical machine learning methods in encrypted traffic is improved by at least 14.1% through the proposed feature engineering approach. In the experiments, the enhanced methods achieve a 98.6% average accuracy and a 98.5% average F1-score on the ISCX-VPN-Service, Cross-Platform-IOS, Cross-Platform-Android, and USTC-TFC2016 datasets, from which we believe a state-of-the-art performance is achieved.
Read full abstract