A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm

Lieping Zhang,Zuqiong Zhang,Liu Tang,Shenglan Zhang,Xianhao Shen,Zhengzhong Wang

doi:10.3390/sym13061057

Lieping Zhang, Zuqiong Zhang + Show 4 more

Open Access

https://doi.org/10.3390/sym13061057

Copy DOI

Journal: Symmetry	Publication Date: Jun 11, 2021
Citations: 20	License type: CC BY 4.0

Affiliation: Guilin University of Technology

Abstract

Directing at various problems of the traditional Q-Learning algorithm, such as heavy repetition and disequilibrium of explorations, the reinforcement-exploration strategy was used to replace the decayed ε-greedy strategy in the traditional Q-Learning algorithm, and thus a novel self-adaptive reinforcement-exploration Q-Learning (SARE-Q) algorithm was proposed. First, the concept of behavior utility trace was introduced in the proposed algorithm, and the probability for each action to be chosen was adjusted according to the behavior utility trace, so as to improve the efficiency of exploration. Second, the attenuation process of exploration factor ε was designed into two phases, where the first phase centered on the exploration and the second one transited the focus from the exploration into utilization, and the exploration rate was dynamically adjusted according to the success rate. Finally, by establishing a list of state access times, the exploration factor of the current state is adaptively adjusted according to the number of times the state is accessed. The symmetric grid map environment was established via OpenAI Gym platform to carry out the symmetrical simulation experiments on the Q-Learning algorithm, self-adaptive Q-Learning (SA-Q) algorithm and SARE-Q algorithm. The experimental results show that the proposed algorithm has obvious advantages over the first two algorithms in the average number of turning times, average inside success rate, and number of times with the shortest planned route.

Highlights

Reinforcement learning (RL), one of methodologies of machine learning, is used to describe and solve how an intelligent agent learns and optimizes the strategy during the interaction with the environment [1]
The self-adaptive reinforcement-exploration Q-Learning (SARE-Q) algorithm was proposed in this study, in order to tackle the problems of the traditional Q-Learning algorithm, e.g., slow convergence and easy local optimization
The route planning was simulated on the OpenAI Gym platform

Summary

Introduction

Reinforcement learning (RL), one of methodologies of machine learning, is used to describe and solve how an intelligent agent learns and optimizes the strategy during the interaction with the environment [1]. The intelligent agent acquires the reinforcement signal (reward feedback) from the environment during the continuous interaction with the environment, and adjusts its own action strategy through the reward feedback, aiming at the maximum gain. Different from supervised learning [2] and semisupervised learning [3], RL does not need to collect training samples in advance, and during the interaction with the environment, the intelligent agent will automatically learn to evaluate the action generated according to the rewards fed back from the environment, instead of being directly told the correct action. The Markov decision-making process is used by the RL algorithm for environment modeling [4]. The value function-based RL method is an important solution to the model-free

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

NAO robot obstacle avoidance based on fuzzy Q-learning
Shuhuan Wen ... Bin Fang
Industrial Robot: the international journal of robotics research and application | VOL. 47
Shuhuan Wen, et. al.Shuhuan Wen ... Bin Fang
18 Oct 2019
Industrial Robot: the international journal of robotics research and application | VOL. 47

Load balancing of multi-AGV road network based on improved Q-learning algorithm and macroscopic fundamental diagram
Xiumei Zhang ... Fang Liu
Complex & Intelligent Systems | VOL. 10
Xiumei Zhang, et. al.Xiumei Zhang ... Fang Liu
10 Jan 2024
Complex & Intelligent Systems | VOL. 10

Fuzzy Q-learning obstacle avoidance algorithm of humanoid robot in unknown environment
Shuhuan Wen ... Zhen Li
-
Shuhuan Wen, et. al.Shuhuan Wen ... Zhen Li
01 Jul 2018
01 Jul 2018

A Q-leaming algorithm applied to the behavioural decision-making of affective virtual human
Yiwei Zhang ... Tianhuang Chen
-
Yiwei Zhang, et. al.Yiwei Zhang ... Tianhuang Chen
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry