Abstract

AbstractLifelong reinforcement learning is able to continually accumulate shared knowledge by estimating the inter‐task relationships based on training data for the learned tasks in order to accelerate learning for new tasks by knowledge reuse. The existing methods employ a linear model to represent the inter‐task relationships by incorporating task features in order to accomplish a new task without any learning. But these methods may be ineffective for general scenarios, where linear models build inter‐task relationships from low‐dimensional task features to high‐dimensional policy parameters space. Also, the deficiency of calculating errors from objective function may arise in the lifelong reinforcement learning process when some errors of policy parameters restrain others due to inter‐parameter correlation. In this paper, we develop a policy generation network that nonlinearly models the inter‐task relationships by mapping low‐dimensional task features to the high‐dimensional policy parameters, in order to represent the shared knowledge more effectively. At the same time, we propose a novel objective function of lifelong reinforcement learning to relieve the deficiency of calculating errors by adding weight constraints for errors. We empirically demonstrate that our method improves the zero‐shot policy performance across a variety of dynamical systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call