Tor serves better at protecting users’ privacy than other anonymous communication tools. Even though it is resistant to deep packet inspection, Tor can be de-anonymized by the website fingerprinting (WF) attack, which aims to monitor the website users are browsing. WF attacks based on deep learning perform better than those using manually designed features and traditional machine learning. However, a deep learning model is data-hungry when simulating the mapping relations of traffic and the website it belongs to, which may not be practical in reality. In this paper, we focus on investigating the composition mechanism of website fingerprinting and try to solve data shortage with bionic traffic traces. More precisely, we propose a new concept called the send-and-receive pair (SRP) to deconstruct traffic traces and design SRP-based cumulative features. We further reconstruct and generate bionic traces (BionicT) based on the rearranged SRPs. The results show that our bionic traces can improve the performance of the state-of-the-artdeep-learning-based Var-CNN. The increment in accuracy reaches up to 50% in the five-shot setting, much more effective than the data augmentation method HDA. In the 15/20-shot setting, our method even defeated TF with more than 95% accuracy in closed-world scenarios and an F1-score of over 90% in open-world scenarios. Moreover, expensive experiments show that our method can enhance the deep learning model’s ability to combat concept drift. Overall, the SRP can serve as an effective tool for analyzing and describing website traffic traces.
Read full abstract