Abstract

The fashionable DQN algorithm suffers from substantial overestimations of action-state value in reinforcement learning problem, such as games in the Atari 2600 domain and path planning domain. To reduce the overestimations of action values during learning, we present a novel combination of double Q-learning and dueling DQN algorithm, and design an algorithm called Variant of Double dueling DQN (V-D D3QN). We focus on the idea behind V-D D3QN algorithm and propose the feasible idea of using two dueling DQN networks to reduce the overestimations of action values during training, and the specific approach is to randomly select one dueling DQN network at each time step to update its parameters, by exploiting the remaining dueling DQN network to determine the update targets. And then we do our experiments in the customized virtual environment-gridmap. Our experiments demonstrate that our proposed algorithm not only reduces the overestimations more efficiently than Double DQN(DDQN) algorithm, but also leads to much better performance on route planning domain with great generalization ability of the new and rapidly changing environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call