Abstract
Dealing with aleatoric uncertainty is key in many domains involving sequential decision making, e.g., planning in AI, network protocols, and symbolic program synthesis. This paper presents a general-purpose model-based framework to obtain policies operating in uncertain environments in a fully automated manner. The new concept of coloured Markov Decision Processes (MDPs) enables a succinct representation of a wide range of synthesis problems. A coloured MDP describes a collection of possible policy configurations with their structural dependencies. The framework covers the synthesis of (a) programmatic policies from probabilistic program sketches and (b) finite-state controllers representing policies for partially observable MDPs (POMDPs), including decentralised POMDPs as well as constrained POMDPs. We show that all these synthesis problems can be cast as exploring memoryless policies in the corresponding coloured MDP. This exploration uses a symbiosis of two orthogonal techniques: abstraction refinement—using a novel refinement method—and counter-example generalisation. Our approach outperforms dedicated synthesis techniques on some problems and significantly improves an earlier version of this framework.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have