Abstract
Given a Markov game, it is often possible to hand-code or learn a set of policies that capture a diversity of possible strategies. It is also often possible to hand-code or learn an abstract simulator of the game that can estimate the outcome of playing two strategies against one another from any state. We consider how to use such policy sets and simulators to make decisions in large Markov games such as real-time strategy (RTS) games. Prior work has considered the problem using an approach we call minimax policy switching. At each decision epoch, all policy pairs are simulated against each other from the current state, and the minimax policy is chosen and used to select actions until the next decision epoch. While intuitively appealing, our first contribution is to show that this switching policy can have arbitrarily poor worst case performance. Our second contribution is to describe a simple modification, whose worst case performance is provably no worse than the minimax fixed policy in the set. Our final contribution is to conduct experiments with these algorithms in the domain of RTS games using both an abstract game engine that we can exactly simulate and a real game engine that we can only approximately simulate. The results show the effectiveness of policy switching when the simulator is accurate, and highlight challenges in the face of inaccurate simulations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Proceedings of the International Conference on Automated Planning and Scheduling
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.