Abstract

Given a Markov game, it is often possible to hand-code or learn a set of policies that capture a diversity of possible strategies. It is also often possible to hand-code or learn an abstract simulator of the game that can estimate the outcome of playing two strategies against one another from any state. We consider how to use such policy sets and simulators to make decisions in large Markov games such as real-time strategy (RTS) games. Prior work has considered the problem using an approach we call minimax policy switching. At each decision epoch, all policy pairs are simulated against each other from the current state, and the minimax policy is chosen and used to select actions until the next decision epoch. While intuitively appealing, our first contribution is to show that this switching policy can have arbitrarily poor worst case performance. Our second contribution is to describe a simple modification, whose worst case performance is provably no worse than the minimax fixed policy in the set. Our final contribution is to conduct experiments with these algorithms in the domain of RTS games using both an abstract game engine that we can exactly simulate and a real game engine that we can only approximately simulate. The results show the effectiveness of policy switching when the simulator is accurate, and highlight challenges in the face of inaccurate simulations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call