DNNs have become pervasive in many security–critical scenarios such as autonomous vehicles and medical diagnoses. Recent studies reveal the susceptibility of DNNs to various adversarial attacks, among which weight Bit-Flip Attacks (BFA) is emerging as a significant security concern. Moreover, Targeted Bit-Flip Attacks (T-BFA), as a novel variant of BFA, can stealthily alter specific source–target classifications while preserving accurate classifications of non-target classes, posing a more severe threat. However, due to the inadequate consideration for T-BFA’s “targeted” characteristic, existing defense mechanisms tend to perform over-protection/-modification to the network, leading to significant defense overheads or non-negligible DNN accuracy reduction.In this work, we propose ALERT, ALightweight defense mechanism for Enhancing DNN Robustness against T-BFA while maintaining network accuracy. Firstly, fully understanding the key factors that dominate the misclassification among source–target class pairs, we propose a Source-Target-Aware Searching (STAS) method to accurately identify the vulnerable weights under T-BFA. Secondly, leveraging the intrinsic redundancy characteristic of DNNs, we propose a weight random switch mechanism to reduce the exposure of vulnerable weights, thereby weakening the expected impact of T-BFA. Striking a delicate balance between enhancing robustness and preserving network accuracy, we develop a metric to meticulously select candidate weights. Finally, to further enhance the DNN robustness, we present a lightweight runtime monitoring mechanism for detecting T-BFA through weight signature verification, and dynamically optimize the weight random switch strategy accordingly. Evaluation results demonstrate that our proposed method effectively enhances the robustness of DNNs against T-BFA while maintaining network accuracy. Compared with the baseline, our method can tolerate 6.7× more flipped bits with negligible accuracy loss (<0.1% in ResNet-50).
Read full abstract