Robust Average-Reward Reinforcement Learning

Yue Wang,George Atia,Ashley Prater-Bennette,Alvaro Velasquez,Shaofeng Zou

doi:10.1613/jair.1.15451

Abstract

Robust Markov decision processes (MDPs) aim to find a policy that optimizes the worst-case performance over an uncertainty set of MDPs. Existing studies mostly have focused on the robust MDPs under the discounted reward criterion, leaving the ones under the average-reward criterion largely unexplored. In this paper, we develop the first comprehensive and systematic study of robust average-reward MDPs, where the goal is to optimize the long-term average performance under the worst case. Our contributions are four-folds: (1) we prove the uniform convergence of the robust discounted value function to the robust average-reward function as the discount factor γ goes to 1; (2) we derive the robust average-reward Bellman equation, characterize the structure of its solution set, and prove the equivalence between solving the robust Bellman equation and finding the optimal robust policy; (3) we design robust dynamic programming algorithms, and theoretically characterize their convergence to the optimal policy; and (4) we design two model-free algorithms unitizing the multi-level Monte-Carlo approach, and prove their asymptotic convergence

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Robust Average-Reward Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Journal of Artificial Intelligence Research

Lead the way for us

Journal: Journal of Artificial Intelligence Research	Publication Date: Jun 16, 2024
License type: cc-by

Similar Papers

Robust Average-Reward Markov Decision Processes
Yue Wang ... Alvaro Velasquez
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37
Yue Wang, et. al.Yue Wang ... Alvaro Velasquez
26 Jun 2023
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37

Robust decomposable Markov decision processes motivated by allocating school budgets
Nedialko B Dimitrov ... Stefanka Chukova
European Journal of Operational Research | VOL. 239
Nedialko B Dimitrov, et. al.Nedialko B Dimitrov ... Stefanka Chukova
17 May 2014
European Journal of Operational Research | VOL. 239

A robust MDP approach to secure power control in cognitive radio networks
Hua Xiao ... Huaizong Shao
-
Hua Xiao, et. al.Hua Xiao ... Huaizong Shao
01 Jun 2012
01 Jun 2012

Living-Donor Liver Transplantation Timing under Ambiguous Health State Transition Probabilities
David Kaufman ... Mark S Roberts
SSRN Electronic Journal | VOL. -
David Kaufman, et. al.David Kaufman ... Mark S Roberts
20 Jul 2017
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust Average-Reward Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Journal of Artificial Intelligence Research