Exploration-Exploitation Strategies in Deep Q-Networks Applied to Route-Finding Problems

Pengyuan Wei

doi:10.1088/1742-6596/1684/1/012073

Abstract

Reinforcement learning is a class of algorithm that allows computers to learn how to accumulate rewards effectively in the environment and ultimately get excellent results. Among them, the exploration-exploitation tradeoff is a very important concept, since a good strategy can improve learning speed and final total reward. In this work, we applied DQN algorithm with different exploration-exploitation strategies to solve traditional route-finding problems. The experimental results show that the epsilon greedy strategy with a parabolic drop in epsilon value over reward improvement is the best, while it is not satisfactory after incorporating the softmax function. We hypothesized that the simplicity of the maze we use in this work in which the agent attempts to find the shortest path leads to the inadequacy of applying softmax to further encourage exploration. Future work thus involves experimenting with mazes at different scales and complexities and observing which exploration-exploitation strategies work best in each condition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Nov 1, 2020
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Exploration-Exploitation Strategies in Deep Q-Networks Applied to Route-Finding Problems

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

Efficiency traps beyond the climate crisis: exploration-exploitation trade-offs and rebound effects.
Jose Segovia-Martin ... Felix Creutzig
Philosophical Transactions of the Royal Society B: Biological Sciences | VOL. 378
Jose Segovia-Martin, et. al.Jose Segovia-Martin ... Felix Creutzig
18 Sep 2023
Philosophical Transactions of the Royal Society B: Biological Sciences | VOL. 378

A comparison of learning speed and ability to cope without exploration between DHP and TD(0)
Michael Fairbank ... Eduardo Alonso
-
Michael Fairbank, et. al.Michael Fairbank ... Eduardo Alonso
01 Jun 2012
01 Jun 2012

Regularization-Adapted Anderson Acceleration for multi-agent reinforcement learning
Siying Wang ... Hong Qu
Knowledge-Based Systems | VOL. 275
Siying Wang, et. al.Siying Wang ... Hong Qu
14 Jun 2023
Knowledge-Based Systems | VOL. 275

A Multiobjective Collaborative Deep Reinforcement Learning Algorithm for Jumping Optimization of Bipedal Robot
Chongben Tao ... Zufeng Zhang
Advanced Intelligent Systems | VOL. 6
Chongben Tao, et. al.Chongben Tao ... Zufeng Zhang
04 Nov 2023
Advanced Intelligent Systems | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploration-Exploitation Strategies in Deep Q-Networks Applied to Route-Finding Problems

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series