Learning attack-defense response in continuous-time discrete-states Stackelberg Security Markov games

Julio B Clempner

doi:10.1080/0952813x.2022.2135615

Abstract

ABSTRACT Researchers have become interested in security games in recent decades as a result of its successful application in real-world security issues. The security model is based on the Stackelberg Security Game (SSG), in which defenders (leaders) select a defensive strategy based on the optimal reaction of attackers (followers), who, at equilibrium, select the predicted assaulting strategy as a response. These applications, on the other hand, do not account for the time constraints posed by the game’s players’ journey time. Furthermore, players should be able to cope with dynamic settings in which their knowledge of the environment changes on a regular basis, allowing them to perform more effectively. This research proposes a security model based on a continuous-time Reinforcement Learning (RL) approach implemented using a temporal difference method that takes prior information into account to address these issues. We use a controlled, ergodic continuous-time Markov game to model the SSG. The game framework model assumes that all information is available. We calculate the number of transitions over a time interval divided by the entire value of the holding time to estimate the transition rates. The arithmetic mean of the observed cost of the individual players is used to estimate the cost for defenders and attackers. An iterated proximal/gradient approach is used to calculate the SSG equilibrium point. We offer a continuous-time random walk method for game implementation. In a numerical case relevant to rain-forest hazards, we analyze the performance of the suggested RL security solution and discuss the problems that should be considered in future.

Full Text