Abstract

The reinforcement learning (RL) problem is typically formalized as the Markov Decision Process (MDP), where an agent interacts with the environment to maximize the long-term expected reward. As an important branch of RL, Lifelong RL requires the agent to consecutively solve a series of tasks modeled as MDPs, each of which is drawn from some distribution. A crucial issue in Lifelong RL is how best to utilize the knowledge from the previous tasks for improving the performance in the current task. As a pioneering work in this field, MaxQInit takes the maximum over action-values learned from previous tasks’ environmental rewards as the initial action-value of the current task. In this way, MaxQInit improves the initial performance in the current task and reduces the sample complexity of learning. However, the rewards obtained in the learning process are usually delayed and sparse, dramatically decreasing the learning efficiency. In this paper, we propose a new method, Shaping Rewards for Lifelong RL (SR-LLRL), to speed up the lifelong learning process by shaping timely and informative rewards for each task. Critically, we construct the Lifetime Reward Shaping (LRS) function based on the knowledge of optimal trajectories collected in previous tasks, providing additional reward information for the current task. Compared with MaxQInit, our method exhibits higher learning efficiency and superior performance in Lifelong RL experiments.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.