Abstract

Reinforcement learning is currently applicable across a range of domains, including robotics, gaming, and natural language processing. However, the approach faces difficulties in environments with sparse rewards. Random network distillation (RND) is a good intrinsic reward solution to this problem. Nevertheless, the RND method’s effectiveness hinges on excellent initialization, and the reliance on random features somewhat constrains the agent’s exploration capabilities. This paper proposes a self-supervised network distillation (SSND) exploration method, addressing the drawbacks of RND’s reliance on initializing random networks while enhancing the agent’s exploration capability in sparse reward environments. The method uses distillation error as intrinsic rewards, with the target network trained using self-supervised learning. During the training of the predictor network, we noticed fluctuations in both loss values and intrinsic rewards, which have a detrimental impact on the performance of the intelligent agent. To resolve this issue, we introduce batch normalization layers to the target network, which helps mitigate intrinsic reward anomalies stemming from the target network’s instability. Experiments show that the self-supervised network distillation is better than RND in terms of exploration speed and performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call