Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Yuqian Jiang ,Bo Wu ,Ufuk Topcu ,Suda Bharadwaj ,Rishi Shah ,Peter Stone

doi:10.26153/tsw/11579

Abstract

In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation. As usual, learning an optimal policy in this setting typically requires a large amount of training experiences. Reward shaping is a common approach for incorporating domain knowledge into reinforcement learning in order to speed up convergence to an optimal policy. However, to the best of our knowledge, the theoretical properties of reward shaping have thus far only been established in the discounted setting. This paper presents the first reward shaping framework for average-reward learning and proves that, under standard assumptions, the optimal policy under the original reward function can be recovered. In order to avoid the need for manual construction of the shaping function, we introduce a method for utilizing domain knowledge expressed as a temporal logic formula. The formula is automatically translated to a shaping function that provides additional reward throughout the learning process. We evaluate the proposed method on three continuing tasks. In all cases, shaping speeds up the average-reward learning rate without any reduction in the performance of the learned policy compared to relevant baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks
Yuqian Jiang ... Rishi Shah
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Yuqian Jiang, et. al.Yuqian Jiang ... Rishi Shah
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Shaping Reward Learning Approach from Passive Samples
Yu Qian ... Zhi-Hua Zhou
Journal of Software | VOL. 24
Yu Qian, et. al.Yu Qian ... Zhi-Hua Zhou
06 Jan 2014
Journal of Software | VOL. 24

Knowledge-Based Reinforcement Learning for Data Mining
Daniel Kudenko ... Marek Grzes
-
Daniel Kudenko, et. al.Daniel Kudenko ... Marek Grzes
01 Jan 2009
01 Jan 2009

Dynamic Reward Shaping: Training a Robot by Voice
Ana C Tenorio-Gonzalez ... Luis Villaseñor-Pineda
-
Ana C Tenorio-Gonzalez, et. al.Ana C Tenorio-Gonzalez ... Luis Villaseñor-Pineda
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Abstract

Talk to us

Similar Papers