On the Convex Formulations of Robust Markov Decision Processes

Julien Grand-Clément,Marek Petrik

doi:10.1287/moor.2022.0284

Abstract

Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the first convex optimization formulation of RMDPs under the classical sa-rectangularity and s-rectangularity assumptions. By using entropic regularization and exponential change of variables, we derive a convex formulation with a number of variables and constraints polynomial in the number of states and actions, but with large coefficients in the constraints. We further simplify the formulation for RMDPs with polyhedral, ellipsoidal, or entropy-based uncertainty sets, showing that, in these cases, RMDPs can be reformulated as conic programs based on exponential cones, quadratic cones, and nonnegative orthants. Our work opens a new research direction for RMDPs and can serve as a first step toward obtaining a tractable convex formulation of RMDPs. Funding: The work in the paper was supported, in part, by NSF [Grants 2144601 and 1815275]; and Agence Nationale de la Recherche [Grant 11-LABX-0047].

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the Convex Formulations of Robust Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: Mathematics of Operations Research

Lead the way for us

Similar Papers

Robust Average-Reward Markov Decision Processes
Yue Wang ... Alvaro Velasquez
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37
Yue Wang, et. al.Yue Wang ... Alvaro Velasquez
26 Jun 2023
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37

Robust decomposable Markov decision processes motivated by allocating school budgets
Nedialko B Dimitrov ... Stefanka Chukova
European Journal of Operational Research | VOL. 239
Nedialko B Dimitrov, et. al.Nedialko B Dimitrov ... Stefanka Chukova
17 May 2014
European Journal of Operational Research | VOL. 239

Solving Long-run Average Reward Robust MDPs via Stochastic Games
Krishnendu Chatterjee ... Đorđe Žikelić
-
Krishnendu Chatterjee, et. al.Krishnendu Chatterjee ... Đorđe Žikelić
01 Aug 2024
01 Aug 2024

Robust Average-Reward Reinforcement Learning
Yue Wang ... Shaofeng Zou
Journal of Artificial Intelligence Research | VOL. 80
Yue Wang, et. al.Yue Wang ... Shaofeng Zou
16 Jun 2024
Journal of Artificial Intelligence Research | VOL. 80

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the Convex Formulations of Robust Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: Mathematics of Operations Research