Application of Q-learning based on adaptive greedy considering negative rewards in football match system

Fei Xue,Ruiping Yuan,Tao Liu,Tingting Dong,Juntao Li

doi:10.1504/ijwmc.2019.099860

Abstract

Aiming at the problem that the multi-robot task allocation method in soccer system is easy to fall into the problem of local optimal solution and real-time performance, a new multi-robot task allocation method is proposed. First of all, in order to improve the speed and efficiency of finding optimal actions and make better use of the disadvantages that traditional Q-learning can't often propagate negative values, we propose a new way to propagate negative values, that is, Q-learning methods based on negative rewards. Next, in order to adapt to the dynamic external environment, an adaptive ε greedy method of which the mode of operation is judged by the ε value is proposed. This method is based on the classical ε-greedy. In the process of solving problems, ε can be adaptively changed as needed for a better balance of exploration and exploitation in reinforcement learning. Finally, we apply this method to the robot's football game system. It has been experimentally proven that dangerous actions can be avoided effectively by the Q-learning method which can spread negative rewards. The adaptive ε-greedy strategy can be used to adapt to the external environment better and faster so as to improve the speed of convergence.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Application of Q-learning based on adaptive greedy considering negative rewards in football match system

Abstract

Talk to us

Similar Papers

More From: International Journal of Wireless and Mobile Computing

Lead the way for us

Similar Papers

Application of Q-learning based on adaptive greedy considering negative rewards in football match system
Ruiping Yuan ... Tao Liu
International Journal of Wireless and Mobile Computing | VOL. 16
Ruiping Yuan, et. al.Ruiping Yuan ... Tao Liu
01 Jan 2019
International Journal of Wireless and Mobile Computing | VOL. 16

Author response: Evolving interpretable plasticity for spiking networks
Jakob Jordan ... Maximilian Schmidt
-
Jakob Jordan, et. al.Jakob Jordan ... Maximilian Schmidt
15 Jul 2021
15 Jul 2021

A reinforcement learning approach to fail-safe design for multiple space robots—cooperation mechanism without communication and negotiation schemes
Keiki Takadama ... Katsunori Shimohara
Advanced Robotics | VOL. 17
Keiki Takadama, et. al.Keiki Takadama ... Katsunori Shimohara
01 Jan 2003
Advanced Robotics | VOL. 17

An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning
Wiebke Potjans ... Markus Diesmann
PLoS Computational Biology | VOL. 7
Wiebke Potjans, et. al.Wiebke Potjans ... Markus Diesmann
12 May 2011
PLoS Computational Biology | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Application of Q-learning based on adaptive greedy considering negative rewards in football match system

Abstract

Talk to us

Similar Papers

More From: International Journal of Wireless and Mobile Computing