Learning in Hide-and-Seek

Qingsi Wang,Mingyan Liu

doi:10.1109/tnet.2015.2412946

Abstract

Existing work on pursuit-evasion problems typically either assumes stationary or heuristic behavior of one side and examines countermeasures of the other, or assumes both sides to be strategic which leads to a game theoretical framework. Results from the former often lack robustness against changes in the adversarial behavior, while those from the second category, typically as equilibrium solution concepts, may be difficult to justify: either due to the implied knowledge of other players' actions/beliefs and knowledge of their knowledge, or due to a lack of efficient dynamics to achieve such equilibria. In this paper, we take a different approach by assuming an intelligent pursuer/evader that is adaptive to the information available to it and is capable of learning over time with performance guarantee. Within this context we investigate two cases. In the first case we assume either the evader or the pursuer is aware of the type of learning algorithm used by the opponent, while in the second case neither side has such information and thus must try to learn. We show that the optimal policies in the first case have a greedy nature. This result is then used to assess the performance of the learning algorithms that both sides employ in the second case, which is shown to be mutually optimal and there is no loss for either side compared to the case when it knows perfectly the adaptive pattern used by the adversary and responses optimally. We further extend our model to study the application of jamming defense.

Full Text