Deep neural networks (DNNs) are fundamental to modern applications like face recognition and autonomous driving. However, their security is a significant concern due to various integrity risks, such as backdoor attacks. In these attacks, compromised training data introduce malicious behaviors into the DNN, which can be exploited during inference or deployment. This paper presents a novel game-theoretic approach to model the interactions between an attacker and a defender in the context of a DNN backdoor attack. The contribution of this approach is multifaceted. First, it models the interaction between the attacker and the defender using a game-theoretic framework. Second, it designs a utility function that captures the objectives of both parties, integrating clean data accuracy and attack success rate. Third, it reduces the game model to a two-player zero-sum game, allowing for the identification of Nash equilibrium points through linear programming and a thorough analysis of equilibrium strategies. Additionally, the framework provides varying levels of flexibility regarding the control afforded to each player, thereby representing a range of real-world scenarios. Through extensive numerical simulations, the paper demonstrates the validity of the proposed framework and identifies insightful equilibrium points that guide both players in following their optimal strategies under different assumptions. The results indicate that fully using attack or defense capabilities is not always the optimal strategy for either party. Instead, attackers must balance inducing errors and minimizing the information conveyed to the defender, while defenders should focus on minimizing attack risks while preserving benign sample performance. These findings underscore the effectiveness and versatility of the proposed approach, showcasing optimal strategies across different game scenarios and highlighting its potential to enhance DNN security against backdoor attacks.
Read full abstract