Abstract

This paper studies the synthesis of control policies for an agent that has to satisfy a temporal logic specification in a partially observable environment, in the presence of an adversary. The interaction of the agent (defender) with the adversary is modeled as a partially observable stochastic game. The goal is to generate a defender policy to maximize satisfaction of a given temporal logic specification under any adversary policy. The search for policies is limited to the space of finite state controllers, which leads to a tractable approach to determine policies. We relate the satisfaction of the specification to reaching (a subset of) recurrent states of a Markov chain. We present an algorithm to determine a set of defender and adversary finite state controllers of fixed sizes that will satisfy the temporal logic specification, and prove that it is sound. We then propose a value-iteration algorithm to maximize the probability of satisfying the temporal logic specification under finite state controllers of fixed sizes. Lastly, we extend this setting to the scenario where the size of the finite state controller of the defender can be incre

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call