Towards robustness evaluation of backdoor defense on quantized deep learning models

Yifan Zhu,Huaibing Peng,Anmin Fu,Wei Yang,Hua Ma,Said F Al-Sarawi,Derek Abbott,Yansong Gao

doi:10.1016/j.eswa.2024.124599

Abstract

Backdoor attacks on deep learning (DL) models emerge as the most worrisome security threats to their secure and safe usage, especially for security-sensitive tasks. Great efforts have been devoted to thwarting backdoor attacks by devising detection or prevention countermeasures. By default, these countermeasures are designed and evaluated on models with full-precision parameters (e.g., floating32). It is unclear whether they are immediately applicable to mitigate backdoor attacks in the quantized model that are being pervasively deployed on mobile devices and Internet of Things (IoT) devices to save resources (i.e. power and memory) and reduce latency and privacy risks.This work, for the first time, initializes the critical examination of the robustness or applicability of existing state-of-the-art (SOTA) DL backdoor defenses for detecting or preventing backdoor attacks on quantized models. Based on extensive evaluations of four representative defenses (Neural Cleanse, ABS, Fine-Pruning and Trojan Signature) with three datasets (CIFAR10, GTSRB, and STL10), we found that only Neural Cleanse’s defensive robustness is generally independent of model quantization, while all others exhibit degraded effectiveness or failures against quantized models (in particular, widely used int-8 and 1-bit models), especially when the model is quantized to be 1-bit. The identified main failure reason is that these defenses are based on examining the weight values of the model or the activation values of the neuron to identify or prevent the backdoor, often using the ranking as a step. Quantization with a small bit width leads to less fine-grained discrete values (e.g., 1-bit quantization only possesses two value elements of -1 and +1), rendering ranking effectiveness deteriorate in this case. Note that the quantization not only applies to the weight but also to activation, thus making these defenses less robust or trivially fail. This work highlights the demand for devising backdoor defenses that are generic to different quantization formats on top of the default full-precision model.

Full Text