Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints

Yuhao Ding,Javad Lavaei

doi:10.1609/aaai.v37i6.25900

Abstract

We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision processes (CMDPs) with non-stationary objectives and constraints, which plays a central role in ensuring the safety of RL in time-varying environments. In this problem, the reward/utility functions and the state transition functions are both allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain known variation budgets. Designing safe RL algorithms in time-varying environments is particularly challenging because of the need to integrate the constraint violation reduction, safe exploration, and adaptation to the non-stationarity. To this end, we identify two alternative conditions on the time-varying constraints under which we can guarantee the safety in the long run. We also propose the Periodically Restarted Optimistic Primal-Dual Proximal Policy Optimization (PROPD-PPO) algorithm that can coordinate with both two conditions. Furthermore, a dynamic regret bound and a constraint violation bound are established for the proposed algorithm in both the linear kernel CMDP function approximation setting and the tabular CMDP setting under two alternative conditions. This paper provides the first provably efficient algorithm for non-stationary CMDPs with safe exploration.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Jun 26, 2023
Citations: 1

Similar Papers

Explicit Explore, Exploit, or Escape (E^4): near-optimal safety-constrained reinforcement learning in polynomial time
David M Bossens ... Nicholas Bishop
Machine Learning | VOL. 112
David M Bossens, et. al.David M Bossens ... Nicholas Bishop
22 Jun 2022
Machine Learning | VOL. 112

MIMO Transmission Control in Fading Channels—A Constrained Markov Decision Process Formulation With Monotone Randomized Policies
Dejan V Djonin ... Vikram Krishnamurthy
IEEE Transactions on Signal Processing | VOL. 55
Dejan V Djonin, et. al.Dejan V Djonin ... Vikram Krishnamurthy
01 Oct 2007
IEEE Transactions on Signal Processing | VOL. 55

Adaptive transmission scheduling over fading channels for energy-efficient cognitive radio networks by reinforcement learning
Jiang Zhu ... Tao Luo
Telecommunication Systems | VOL. 42
Jiang Zhu, et. al.Jiang Zhu ... Tao Luo
12 Jun 2009
Telecommunication Systems | VOL. 42

AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training
...
-
, et. al. ...
11 Apr 2021
11 Apr 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence