Abstract

We propose an Information bottleneck (IB) for Goal representation learning (InfoGoal), a self-supervised method for generalizable goal-conditioned reinforcement learning (RL). Goal-conditioned RL learns a policy from reward signals to predict actions for reaching desired goals. However, the policy would overfit the task-irrelevant information contained in the goal and may be falsely or ineffectively generalized to reach other goals. A goal representation containing sufficient task-relevant information and minimum task-irrelevant information is guaranteed to reduce generalization errors. However, in goal-conditioned RL, it is difficult to balance the tradeoff between task-relevant information and task-irrelevant information because of the sparse and delayed learning signals, i.e., reward signals, and the inevitable task-relevant information sacrifice caused by information compression. Our InfoGoal learns a minimum and sufficient goal representation with dense and immediate self-supervised learning signals. Meanwhile, InfoGoal adaptively adjusts the weight of information minimization to achieve maximum information compression with a reasonable sacrifice of task-relevant information. Consequently, InfoGoal enables policy to generate a targeted trajectory toward states where the desired goal can be found with high probability and broadly explores those states. We conduct experiments on both simulated and real-world tasks, and our method significantly outperforms baseline methods in terms of policy optimality and the success rate of reaching unseen test goals. Video demos are available at infogoal.github.io.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call