Abstract

Dealing with real-world reinforcement learning (RL) tasks, we shall be aware that the environment may have sudden changes. We expect that a robust policy is able to handle such changes and adapt to the new environment rapidly. Context-based meta reinforcement learning aims at learning environment adaptable policies. These methods adopt a context encoder to perceive the environment on-the-fly, following which a contextual policy makes environment adaptive decisions according to the context. However, previous methods show lagged and unstable context extraction, which are hard to handle sudden changes well. This paper proposes an environment sensitive contextual policy learning (ESCP) approach, in order to improve both the sensitivity and the robustness of context encoding. ESCP is composed of three key components: variance minimization that forces a rapid and stable encoding of the environment context, relational matrix determinant maximization that avoids trivial solutions, and a history-truncated recurrent neural network model that avoids old memory interference. We use a grid-world task and 5 locomotion controlling tasks with changing parameters to empirically assess our algorithm. Experiment results show that in environments with both in-distribution and out-of-distribution parameter changes, ESCP can not only better recover the environment encoding, but also adapt more rapidly to the post-change environment (10x faster in the grid-world) while the return performance is kept or improved, compared with state-of-the-art meta RL methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call