Abstract

Directing at various problems of the traditional Q-Learning algorithm, such as heavy repetition and disequilibrium of explorations, the reinforcement-exploration strategy was used to replace the decayed ε-greedy strategy in the traditional Q-Learning algorithm, and thus a novel self-adaptive reinforcement-exploration Q-Learning (SARE-Q) algorithm was proposed. First, the concept of behavior utility trace was introduced in the proposed algorithm, and the probability for each action to be chosen was adjusted according to the behavior utility trace, so as to improve the efficiency of exploration. Second, the attenuation process of exploration factor ε was designed into two phases, where the first phase centered on the exploration and the second one transited the focus from the exploration into utilization, and the exploration rate was dynamically adjusted according to the success rate. Finally, by establishing a list of state access times, the exploration factor of the current state is adaptively adjusted according to the number of times the state is accessed. The symmetric grid map environment was established via OpenAI Gym platform to carry out the symmetrical simulation experiments on the Q-Learning algorithm, self-adaptive Q-Learning (SA-Q) algorithm and SARE-Q algorithm. The experimental results show that the proposed algorithm has obvious advantages over the first two algorithms in the average number of turning times, average inside success rate, and number of times with the shortest planned route.

Highlights

  • Reinforcement learning (RL), one of methodologies of machine learning, is used to describe and solve how an intelligent agent learns and optimizes the strategy during the interaction with the environment [1]

  • The self-adaptive reinforcement-exploration Q-Learning (SARE-Q) algorithm was proposed in this study, in order to tackle the problems of the traditional Q-Learning algorithm, e.g., slow convergence and easy local optimization

  • The route planning was simulated on the OpenAI Gym platform

Read more

Summary

Introduction

Reinforcement learning (RL), one of methodologies of machine learning, is used to describe and solve how an intelligent agent learns and optimizes the strategy during the interaction with the environment [1]. The intelligent agent acquires the reinforcement signal (reward feedback) from the environment during the continuous interaction with the environment, and adjusts its own action strategy through the reward feedback, aiming at the maximum gain. Different from supervised learning [2] and semisupervised learning [3], RL does not need to collect training samples in advance, and during the interaction with the environment, the intelligent agent will automatically learn to evaluate the action generated according to the rewards fed back from the environment, instead of being directly told the correct action. The Markov decision-making process is used by the RL algorithm for environment modeling [4]. The value function-based RL method is an important solution to the model-free

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.