Abstract

In black-box adversarial attacks, attackers query the deep neural network (DNN) and use the query results to optimize the adversarial samples iteratively. In this paper, we study the method of adding white noise to the DNN output to mitigate such attacks. One of our unique contributions is a theoretical analysis of gradient signal-to-noise ratio (SNR), which shows the trade-off between the defense noise level and the attack query cost. The attacker’s query count (QC) is derived mathematically as a function of noise standard deviation. This will guide the defender to find the appropriate noise level for mitigating attacks to the desired security level specified by QC and DNN performance loss. Our analysis shows that the added noise is drastically magnified by the small variation of DNN outputs, which makes the reconstructed gradient have an extremely low SNR. Adding slight white noise with a very small standard deviation, e.g., less than 0.01, is enough to increase QC by many orders of magnitude yet without introducing any noticeable classification accuracy reduction. Our experiments demonstrate that this method can effectively mitigate both soft-label and hard-label black-box attacks under realistic QC constraints. We also prove that this method outperforms many other defense methods and is robust to the attacker’s countermeasures.

Highlights

  • Along with the rapid development of deep neural networks (DNNs), there are a lot of online services, such as Clarifai API, Google Photos, advertisement detection and fake news filtering, etc., that highly rely on DNNs

  • An intriguing issue is that DNNs are highly susceptible to small variations in input data [1]

  • The former assumes that the attackers have complete knowledge of the deep network, while the latter assumes that the attackers have limited knowledge, typically some output information of the DNNs

Read more

Summary

Introduction

Along with the rapid development of deep neural networks (DNNs), there are a lot of online services, such as Clarifai API, Google Photos, advertisement detection and fake news filtering, etc., that highly rely on DNNs. Online DNN servers suffer from adversarial attacks where the attackers can slightly change the input data to make DNNs give false results or misclassification [2]. Depending on the knowledge about the DNNs that the attackers have, adversarial attacks can be classified into white-box attacks [1], [3]–[5] and black-box attacks [6]– [13]. The former assumes that the attackers have complete knowledge of the deep network, while the latter assumes that the attackers have limited knowledge, typically some output information of the DNNs. Compared with white-box attacks, black-box attacks are more realistic threats to realworld practical applications

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.