Abstract

Contrastive learning has been used to learn useful low-dimensional state representations in visual reinforcement learning (RL). Such state representations substantially improve the sample efficiency of visual RL. Nevertheless, existing contrastive learning-based RL methods have the problem of unstable training. Such instability comes from the fact that contrastive learning requires an extremely large batch size (e.g., 4096 or larger), while current contrastive learning-based RL methods typically set a small batch size (e.g., 512). In this paper, we propose an approach of discrete information bottleneck (DIB) to address this problem. DIB applies the technique of discretization and information bottleneck to contrastive learning in representing the state with concise discrete representation. Using this discrete representation for policy learning results in more stable algorithm training and higher sample efficiency with a small batch size. We demonstrate the advantage of discrete state representation of DIB on several continuous control tasks in the DeepMind Control suite. In the experiments, DIB outperforms prior visual RL methods, both model-based and model-free, in terms of performance and sample efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call