Abstract

This work introduces a novel data augmentation method for few-shot website fingerprinting (WF) attack where only a handful of training samples per website are available for deep learning model optimization. Moving beyond earlier WF methods relying on manually-engineered feature representations, more advanced deep learning alternatives demonstrate that learning feature representations automatically from training data is superior. Nonetheless, this advantage is subject to an unrealistic assumption that there exist many training samples per website, which otherwise will disappear. To address this, we introduce a model-agnostic, efficient, and harmonious data augmentation (HDA) method that can improve deep WF attacking methods significantly. HDA involves both intrasample and intersample data transformations that can be used in a harmonious manner to expand a tiny training dataset to an arbitrarily large collection, therefore effectively and explicitly addressing the intrinsic data scarcity problem. We conducted expensive experiments to validate our HDA for boosting state-of-the-art deep learning WF attack models in both closed-world and open-world attacking scenarios, at absence and presence of strong defense. For instance, in the more challenging and realistic evaluation scenario with WTF-PAD-based defense, our HDA method surpasses the previous state-of-the-art results by nearly 3% in classification accuracy in the 20-shot learning case. An earlier version of this work Chen et al. (2021) has been presented as preprint in ArXiv (https://arxiv.org/abs/2101.10063).

Highlights

  • For privacy protection in accessing the Internet, an increasing number of users have turned to anonymous networks. e Onion Router (Tor) [1, 2] is one of the most popular choices [3].As a free and open-source software, Tor boosts anonymous communication

  • (1) Some hand-crafted featurebased methods (CUMUL) are superior over recent deep learning methods (ResNet-34 and Var-convolutional neural networks (CNN)) at the few-shot learning scenarios. is is mainly because the latter suffers from lacking enough training samples, resulting in model overfitting

  • Var-CNN + harmonious data augmentation (HDA) outperforms the other competitors by a moderate margin, e.g., 2.9% gap over the best competitor CUMUL. (3) ResNet-34 is surpassed by Var-CNN continuously

Read more

Summary

Introduction

As a free and open-source software, Tor boosts anonymous communication. It directs Internet traffic through a free, worldwide, and volunteer overlay network with thousands of relays, concealing a user’s location and usage from anyone conducting network surveillance or traffic analysis. It encrypts the content of communication and sends the data through a route comprised of successive random-selected Tor nodes. This remains not completely secure due to exposure of data transportation patterns before reaching Tor servers. A local attacker would eavesdrop on the connection between a user and the guard node of the Tor network, with the attacking positions including any devices in the same

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.