Abstract

As the first successful attempt to combine deep neural network and reinforcement learning, Deep Q-learning Network (DQN) draws a lot of attention from reinforcement learning researchers. One of the most important components of DQN is target network, which is used to stabilize learning process. When confront complex network structure, the existence of target network means extra memory resource to preserve the neural network weights and high computing cost to calculate target. Thus, we propose a Deep Hybrid Q-learning Network (DHQN) algorithm, which introduces an alternative approach, Random Hybrid Optimization (RHO), that can simplify DQN and attain a more stable and faster learning without a target network. We illustrate that RHO can decelerate divergence in the classical off-policy counterexample θ → 2θ problem. We also testify the effectiveness of DHQN in several control and Atari domains, which shows DHQN outperforms DQN without a target network and original DQN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call