Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Yuqian Jiang,Suda Bharadwaj,Peter Stone,Bo Wu,Ufuk Topcu,Rishi Shah

doi:10.1609/aaai.v35i9.16975

Abstract

In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation. As usual, learning an optimal policy in this setting typically requires a large amount of training experiences. Reward shaping is a common approach for incorporating domain knowledge into reinforcement learning in order to speed up convergence to an optimal policy. However, to the best of our knowledge, the theoretical properties of reward shaping have thus far only been established in the discounted setting. This paper presents the first reward shaping framework for average-reward learning and proves that, under standard assumptions, the optimal policy under the original reward function can be recovered. In order to avoid the need for manual construction of the shaping function, we introduce a method for utilizing domain knowledge expressed as a temporal logic formula. The formula is automatically translated to a shaping function that provides additional reward throughout the learning process. We evaluate the proposed method on three continuing tasks. In all cases, shaping speeds up the average-reward learning rate without any reduction in the performance of the learned policy compared to relevant baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: May 18, 2021
Citations: 12

Similar Papers

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks
...
-
, et. al. ...
19 Apr 2021
19 Apr 2021

Shaping Reward Learning Approach from Passive Samples
Yu Qian ... Zhi-Hua Zhou
Journal of Software | VOL. 24
Yu Qian, et. al.Yu Qian ... Zhi-Hua Zhou
06 Jan 2014
Journal of Software | VOL. 24

Knowledge-Based Reinforcement Learning for Data Mining
Daniel Kudenko ... Marek Grzes
-
Daniel Kudenko, et. al.Daniel Kudenko ... Marek Grzes
01 Jan 2009
01 Jan 2009

Dynamic Reward Shaping: Training a Robot by Voice
Ana C Tenorio-Gonzalez ... Luis Villaseñor-Pineda
-
Ana C Tenorio-Gonzalez, et. al.Ana C Tenorio-Gonzalez ... Luis Villaseñor-Pineda
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence