Model-free robust reinforcement learning via Polynomial Chaos

Jianxiang Liu,Faguo Wu,Xiao Zhang

doi:10.1016/j.knosys.2024.112783

Jianxiang Liu, Faguo Wu + Show 1 more

https://doi.org/10.1016/j.knosys.2024.112783

Copy DOI

Export

Save

Cite

Journal: Knowledge-Based Systems

Publication Date: Jan 1, 2025

Abstract
Full-Text
Similar Papers

Abstract

Listen

In recent years, the Robust Markov Decision Process (RMDP) has become an important modeling framework to address the discrepancies between simulated and real-world environments in Reinforcement Learning (RL) training. The purpose of RMDP is to accommodate the uncertainty of the real-world environments, employing a conservative approach to enhance the robustness of policy decisions. However, due to the difficulty of robust value function estimation, the RMDP framework is challenging to generalize to environments with large continuous state–action spaces. Our work focuses on model-free robust RL and proposes a model-free algorithm for continuous space setting. We adopt a new perspective on uncertainty sets such that the uncertainty sets are parameterized and the parameters obey specific stochastic distributions. We present a novel approach RPC to estimate the robust value function utilizing generalized Polynomial Chaos(gPC). We provide a proof to guarantee the convergence of the algorithm. Our training framework is based on off-policy RL, which reduces the computation overhead by gPC and improves learning stability. Our algorithm can handle continuous tasks and guarantee the robustness of the algorithm without incurring excessive computational overhead. We combine RPC with the TD3 method and conduct several experiments to evaluate its performance in a continuous robot control task, and the experimental results provide further evidence of the robustness of our algorithm.

Full Text