Optimal Deception Asset Deployment in Cybersecurity: A Nash Q-Learning Approach in Multi-Agent Stochastic Games

Guanhua Kong,Xiaohan Yang,Weizhen He,Guozhen Cheng,Shuai Zhang,Fucai Chen

doi:10.3390/app14010357

Abstract

In the face of an increasingly intricate network structure and a multitude of security threats, cyber deception defenders often employ deception assets to safeguard critical real assets. However, when it comes to the intranet lateral movement attackers in the cyber kill chain, the deployment of deception assets confronts the challenges of lack of dynamics, inability to make real-time decisions, and not considering the dynamic change of an attacker’s strategy. To address these issues, this study introduces a novel maze pathfinding model tailored to the lateral movement context, in which we try to find out the attacker’s location to deploy deception assets accurately for interception. The attack–defense process is modeled as a multi-agent stochastic game, by comparing it with random action policy and Minimax-Q algorithm, we choose Nash Q-learning to solve the deception asset’s deployment strategy to achieve the optimal solution effect. Extensive simulation tests reveal that our proposed model exhibits good convergence properties. Moreover, the average defense success rate surpasses 70%, attesting to the model’s efficacy.

Full Text