The one-sided two-player zero-sum partially observable stochastic games (OTZ-POSG) have emerged as popular models recently, especially in the cyber-security literature. In this game, the state is hidden for one player but directly observed by the other player for whom we call the informed player. All existing OTZ-POSG models assumed that the observation space is discrete and the action of the informed player is private. However, these assumptions may become invalid in real applications. For example, for the virtual machine migration techniques in moving target defenses, the observed traffic data of a network is in a continuous space and the switching strategy of the defender can be inferred by the attacker. This paper, therefore, proposes a continuous-observation OTZ-POSG with public actions and studies the existence of its equilibrium. The main challenge induced by the public action is the potential information leakage–the action of the informed player could reveal state information because his or her policy is state-dependent. To solve this issue, we adopt a two-step belief update strategy for players and prove the existence of a Stackelberg equilibrium. We show that the game can be solved iteratively through value iteration. However, calculating the exact value function is impractical as the observation space is continuous. To mitigate the computational complexity issue, we propose a point-based approximation algorithm to approximate the exact value function and meanwhile extend the dynamic partitioning approach to discretize the observation space into finite discrete partitions. We show that the value function of the leader can be approximated by a piece-wise linear and concave function with an error bound. Finally, examples are used in each section to illustrate ideas of the proposed algorithms.
Read full abstract