WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Qisong Yang,Thiago D Simão,Matthijs T J Spaan,Simon H Tindemans

doi:10.1609/aaai.v35i12.17272

Abstract

Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control. More specifically, a certain level of conditional Value-at-Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: May 18, 2021
Citations: 31

Similar Papers

Next-gen resource optimization in NB-IoT networks: Harnessing soft actor–critic reinforcement learning
S Anbazhagan ... R.K Mugelan
Computer Networks | VOL. 252
S Anbazhagan, et. al.S Anbazhagan ... R.K Mugelan
01 Jul 2024
Computer Networks | VOL. 252

Prospective Experiment for Reinforcement Learning on Demand Response in a Social Game Framework
Lucas Spangher ... Alex Devonport
-
Lucas Spangher, et. al.Lucas Spangher ... Alex Devonport
12 Jun 2020
12 Jun 2020

Deep Reinforcement Learning for Vision-Based Navigation of UAVs in Avoiding Stationary and Mobile Obstacles
Amudhini P Kalidas ... Senthilkumar Mohan
Drones | VOL. 7
Amudhini P Kalidas, et. al.Amudhini P Kalidas ... Senthilkumar Mohan
01 Apr 2023
Drones | VOL. 7

Energy management strategy via maximum entropy reinforcement learning for an extended range logistics vehicle
Boyi Xiao ... Nong Zhang
Energy | VOL. 253
Boyi Xiao, et. al.Boyi Xiao ... Nong Zhang
28 Apr 2022
Energy | VOL. 253

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence