Self-Supervised Network Distillation for Exploration

Xu Zhang,Jiguang Qiu,Weisi Chen,Ruiyu Dai

doi:10.1142/s0218001423510217

Abstract

Reinforcement learning is currently applicable across a range of domains, including robotics, gaming, and natural language processing. However, the approach faces difficulties in environments with sparse rewards. Random network distillation (RND) is a good intrinsic reward solution to this problem. Nevertheless, the RND method’s effectiveness hinges on excellent initialization, and the reliance on random features somewhat constrains the agent’s exploration capabilities. This paper proposes a self-supervised network distillation (SSND) exploration method, addressing the drawbacks of RND’s reliance on initializing random networks while enhancing the agent’s exploration capability in sparse reward environments. The method uses distillation error as intrinsic rewards, with the target network trained using self-supervised learning. During the training of the predictor network, we noticed fluctuations in both loss values and intrinsic rewards, which have a detrimental impact on the performance of the intelligent agent. To resolve this issue, we introduce batch normalization layers to the target network, which helps mitigate intrinsic reward anomalies stemming from the target network’s instability. Experiments show that the self-supervised network distillation is better than RND in terms of exploration speed and performance.

Full Text