APD: Learning Diverse Behaviors for Reinforcement Learning Through Unsupervised Active Pre-Training

Kailin Zeng,Bin Chen,Qiyuan Zhang,Bin Liang,Jun Yang

doi:10.1109/lra.2022.3214057

Abstract

Unsupervised pre-training in reinforcement learning enables the agent to gain prior environmental knowledge, which is then fine-tuned in the supervised stage to quickly adapt to various downstream tasks. In the absence of task-related rewards, pre-training aims to acquire policies (i.e., behaviors) that generate different trajectories to explore and master the environment. Previous research categorizes states into their associated behaviors by learning a supervised discriminator. However, an underlying problem persists: such discriminator is trained in lack of relevant data, leading to an underestimation of reward for new states and inadequate exploration. To this end, we introduce an unsupervised active pre-training algorithm for diverse behavior induction (APD). We explicitly characterize the behavior variables with a state-dependent sampling method, and the agent can decompose the entire state space into parts for fine-grained and diverse behavior learning. Specifically, a particle-based entropy estimator is applied to optimize a combination of behavioral entropy and mutual information objective. Moreover, we develop behavior-based representation learning to compress states into the latent space. Experiments show that our method can improve exploration efficiency and outperforms most state-of-the-art unsupervised algorithms on a number of continuous control tasks in the DeepMind Control Suite.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

APD: Learning Diverse Behaviors for Reinforcement Learning Through Unsupervised Active Pre-Training

Abstract

Talk to us

Similar Papers

More From: IEEE Robotics and Automation Letters

Lead the way for us

Journal: IEEE Robotics and Automation Letters	Publication Date: Oct 1, 2022
Citations: 2

Similar Papers

Residual Contrastive Learning for Image Reconstruction: Learning Transferable Representations from Noisy Images
Nanqing Dong ... Yongxin Yang
-
Nanqing Dong, et. al.Nanqing Dong ... Yongxin Yang
01 Jul 2022
01 Jul 2022

Review of unsupervised pretraining strategies for molecules representation.
Linhui Yu ... Yansen Su
Briefings in functional genomics | VOL. 20
Linhui Yu, et. al.Linhui Yu ... Yansen Su
02 Aug 2021
Briefings in functional genomics | VOL. 20

Deeply Unsupervised Patch Re-Identification for Pre-Training Object Detectors.
Jian Ding ... Enze Xie
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 46
Jian Ding, et. al.Jian Ding ... Enze Xie
01 Mar 2024
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 46

Generative Text Modeling through Short Run Inference
Bo Pang ... Tian Han
-
Bo Pang, et. al.Bo Pang ... Tian Han
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

APD: Learning Diverse Behaviors for Reinforcement Learning Through Unsupervised Active Pre-Training

Abstract

Talk to us

Similar Papers

More From: IEEE Robotics and Automation Letters