Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning

Chayoung Kim

doi:10.3390/sym15101840

Abstract

In this study, a target-network update of deep reinforcement learning (DRL) based on mutual information (MI) and rewards is proposed. In DRL, updating the target network from the Q network was used to reduce training diversity and contribute to the stability of learning. If it is not properly updated, the overall update rate is reduced to mitigate this problem. Simply slowing down is not recommended because it reduces the speed of the decaying learning rate. Some studies have been conducted to improve the issues with the t-soft update based on the Student’s-t distribution or a method that does not use the target-network. However, there are certain situations in which using the Student’s-t distribution might fail or force it to use more hyperparameters. A few studies have used MI in deep neural networks to improve the decaying learning rate and directly update the target-network by replaying experiences. Therefore, in this study, the MI and reward provided in the experience replay of DRL are combined to improve both the decaying learning rate and the target-network updating. Utilizing rewards is appropriate for use in environments with intrinsic symmetry. It has been confirmed in various OpenAI gymnasiums that stable learning is possible while maintaining an improvement in the decaying learning rate.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Journal: Symmetry	Publication Date: Sep 28, 2023
License type: CC BY 4.0

Similar Papers

Sample effficient deep reinforcement learning for control

-

15 Dec 2019
15 Dec 2019

Deep Reinforcement Learning: A New Frontier in Computer Vision Research
Sejuti Rahman ... A K M Nadimul Haque
-
Sejuti Rahman, et. al.Sejuti Rahman ... A K M Nadimul Haque
01 Jan 2020
01 Jan 2020

Break through the limits of learning by machines
Zhongzhi Shi
Chinese Science Bulletin | VOL. 61
Zhongzhi ShiZhongzhi Shi
20 Sep 2016
Chinese Science Bulletin | VOL. 61

Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks.
Shrihari Vasudevan
Entropy (Basel, Switzerland) | VOL. 22
Shrihari VasudevanShrihari Vasudevan
17 May 2020
Entropy (Basel, Switzerland) | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Symmetry