Defense decision-making in cybersecurity has increasingly relied upon stochastic game processes that combine game theory with a Markov decision process (MDP). However, the MDP presumes that both attackers and defenders are perfectly rational and have complete information, which greatly limits the scope of application and guidance value of MDP to the defense decision-making process. The present study addresses this issue by applying a partially observable MDP to analyze attack-defense behaviors, and a deep Q-network (DQN) algorithm based on a recurrent neural network for solving game equilibria dynamically and intelligently under conditions of partial rationality and incomplete information. The proposed DQN method enables network defense strategies to leverage online learning to gradually approach an optimal defense strategy. The rationality and convergence of the proposed approach are demonstrated by conducting simulations and comparative analyses of both the attacking and defending parties engaged in distributed reflection denial of service attacks.
Read full abstract