Pure-Past Action Masking

Giovanni Varricchione,Natasha Alechina,Giuseppe De Giacomo,Brian Logan,Mehdi Dastani,Giuseppe Perelli

doi:10.1609/aaai.v38i19.30163

Abstract

We present Pure-Past Action Masking (PPAM), a lightweight approach to action masking for safe reinforcement learning. In PPAM, actions are disallowed (“masked”) according to specifications expressed in Pure-Past Linear Temporal Logic (PPLTL). PPAM can enforce non-Markovian constraints, i.e., constraints based on the history of the system, rather than just the current state of the (possibly hidden) MDP. The features used in the safety constraint need not be the same as those used by the learning agent, allowing a clear separation of concerns between the safety constraints and reward specifications of the (learning) agent. We prove formally that an agent trained with PPAM can learn any optimal policy that satisfies the safety constraints, and that they are as expressive as shields, another approach to enforce non-Markovian constraints in RL. Finally, we provide empirical results showing how PPAM can guarantee constraint satisfaction in practice.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pure-Past Action Masking

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Context-Aware Safe Reinforcement Learning for Non-Stationary Environments
Baiming Chen ... Ding Zhao
-
Baiming Chen, et. al.Baiming Chen ... Ding Zhao
30 May 2021
30 May 2021

Double Q–learning Agent for Othello Board Game
Thamarai Selvi Somasundaram ... Harini Mahadevan
-
Thamarai Selvi Somasundaram, et. al.Thamarai Selvi Somasundaram ... Harini Mahadevan
01 Dec 2018
01 Dec 2018

Self-adaptive Traffic and Logistics Flow Control using Learning Agents and Ubiquitous Sensors
Stefan Bosse
Procedia Manufacturing | VOL. 52
Stefan BosseStefan Bosse
01 Jan 2020
Procedia Manufacturing | VOL. 52

Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes
Aria Hasanzadezonuzy ... Dileep Kalathil
-
Aria Hasanzadezonuzy, et. al.Aria Hasanzadezonuzy ... Dileep Kalathil
01 Aug 2021
01 Aug 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pure-Past Action Masking

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence