Abstract

Meta reinforcement learning (meta-RL) is a promising technique for fast task adaptation by leveraging prior knowledge from previous tasks. Recently, context-based meta-RL has been proposed to improve data efficiency by applying a principled framework, dividing the learning procedure into task inference and task execution. However, the task information is not adequately leveraged in this approach, thus leading to inefficient exploration. To address this problem, we propose a novel context-based meta-RL framework with an improved exploration mechanism. For the existing exploration and execution problem in context-based meta-RL, we propose a novel objective that employs two exploration terms to encourage better exploration in action and task embedding space, respectively. The first term pushes for improving the diversity of task inference, while the second term, named action information, works as sharing or hiding task information in different exploration stages. We divide the meta-training procedure into task-independent exploration and task-relevant exploration stages according to the utilization of action information. By decoupling task inference and task execution and proposing the respective optimization objectives in the two exploration stages, we can efficiently learn policy and task inference networks. We compare our algorithm with several popular meta-RL methods on MuJoco benchmarks with both dense and sparse reward settings. The empirical results show that our method significantly outperforms baselines on the benchmarks in terms of sample efficiency and task performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call