Can Adversarial Weight Perturbations Inject Neural Backdoors

Siddhant Garg,Yingyu Liang,Adarsh Kumar,Vibhor Goel

doi:10.1145/3340531.3412130

Abstract

Adversarial machine learning has exposed several security hazards of neural models and has become an important research topic in recent times. Thus far, the concept of an "adversarial perturbation" has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of "adversarial perturbations" to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a security risk of using publicly available trained models. Here, injecting a backdoor refers to obtaining a desired outcome from the model when a trigger pattern is added to the input, while retaining the original model predictions on a non-triggered input. From the perspective of an adversary, we characterize these adversarial perturbations to be constrained within an $\ell_{\infty}$ norm around the original model weights. We introduce adversarial perturbations in the model weights using a composite loss on the predictions of the original model and the desired trigger through projected gradient descent. We empirically show that these adversarial weight perturbations exist universally across several computer vision and natural language processing tasks. Our results show that backdoors can be successfully injected with a very small average relative change in model weight values for several applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Can Adversarial Weight Perturbations Inject Neural Backdoors

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Generative UAP attacks against deep‐learning based modulation classification
Xiong Li ... Shaoping Chen
IET Communications | VOL. 17
Xiong Li, et. al.Xiong Li ... Shaoping Chen
18 Apr 2023
IET Communications | VOL. 17

Novel Exploit Feature-Map-Based Detection of Adversarial Attacks
Ali Saeed Almuflih ... Viral V Kapdia
Applied Sciences | VOL. 12
Ali Saeed Almuflih, et. al.Ali Saeed Almuflih ... Viral V Kapdia
20 May 2022
Applied Sciences | VOL. 12

Adversarial Training with Orthogonal Regularization
Oğuz Kaan Yüksel ... İnci Meliha Baytaş
-
Oğuz Kaan Yüksel, et. al.Oğuz Kaan Yüksel ... İnci Meliha Baytaş
05 Oct 2020
05 Oct 2020

Crafting adversarial example with adaptive root mean square gradient on deep neural networks
Yatie Xiao ... Bo Liu
Neurocomputing | VOL. 389
Yatie Xiao, et. al.Yatie Xiao ... Bo Liu
25 Jan 2020
Neurocomputing | VOL. 389

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Can Adversarial Weight Perturbations Inject Neural Backdoors

Abstract

Talk to us

Similar Papers