Abstract

We consider the problem of constructing control policies that are robust against distribution errors in the model parameters of Markov decision processes. The Wasserstein metric is used to model the ambiguity set of admissible distributions. We prove the existence and optimality of Markov policies and develop convex optimization-based tools to compute and analyze the policies. Our methods, which are based on the Kantorovich convex relaxation and duality principle, have the following advantages. First, the proposed dual formulation of an associated Bellman equation resolves the infinite dimensionality issue that is inherent in its original formulation when the nominal distribution has a finite support. Second, our duality analysis identifies the structure of a worst-case distribution and provides a simple decentralized method for its construction. Third, a sensitivity analysis tool is developed to quantify the effect of ambiguity set parameters on the performance of distributionally robust policies. The effectiveness of our proposed tools is demonstrated through a human-centered air conditioning problem.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call