Abstract

There is a growing interest in applying Stackelberg games to model resource allocation for patrolling security problems in which defenders must allocate limited security resources to protect targets from attack by adversaries. In real-world adversaries are sophisticated presenting dynamic strategies. Most existing approaches for computing defender strategies calculate the game against fixed behavioral models of adversaries, and cannot ensure success in the realization of the game. To address this shortcoming, this paper presents a novel approach for adapting preferred strategies in controlled Stackelberg security games using a reinforcement learning (RL) approach for attackers and defenders employing an average rewards.We propose a common framework that combines prior knowledge and temporal-difference method in reinforcement learning. The overall RL architecture involves two highest components: the adaptive primary learning architecture and the actor-critic architecture. In this work we consider a Stackelberg security game in case of a metric state space for a class of time-discrete ergodic controllable Markov chains games. For computing the equilibrium point we employ the extraproximal method. Finally, a game theory example illustrates the main results and the effectiveness of the method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call