Abstract

Reinforcement learning and planning have been revolutionized in recent years, due in part to the mass adoption of deep convolutional neural networks and the resurgence of powerful methods to refine decision-making policies. However, the problem of sparse reward signals remains pervasive in many domains. While various reward-shaping mechanisms and imitation learning approaches have been proposed to mitigate this problem, a mathematically rigorous structure of the underlying objective is rarely exploited. In this paper, we resolve this by defining objectives using linear temporal logic over finite traces (LTL_f) and utilize the automaton representation of said objectives in order to define novel reward shaping functions that mitigate the sparse rewards problem within modern Monte Carlo Tree Search (MCTS) methods. We further demonstrate that such verification-guided reward shaping can be utilized to facilitate transfer learning between different environments when the objective is the same.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call