Abstract

Solving reinforcement learning problems in continuous space with function approximation is currently a research hotspot of machine learning. When dealing with the continuous space problems, the classicQ-iteration algorithms based on lookup table or function approximation converge slowly and are difficult to derive a continuous policy. To overcome the above weaknesses, we propose an algorithm named DFR-Sarsa(λ) based on double-layer fuzzy reasoning and prove its convergence. In this algorithm, the first reasoning layer uses fuzzy sets of state to compute continuous actions; the second reasoning layer uses fuzzy sets of action to compute the components ofQ-value. Then, these two fuzzy layers are combined to compute theQ-value function of continuous action space. Besides, this algorithm utilizes the membership degrees of activation rules in the two fuzzy reasoning layers to update the eligibility traces. Applying DFR-Sarsa(λ) to the Mountain Car and Cart-pole Balancing problems, experimental results show that the algorithm not only can be used to get a continuous action policy, but also has a better convergence performance.

Highlights

  • Reinforcement learning is a kind of machine learning methods that gets the maximum cumulative rewards by interacting with the environment [1, 2]

  • Though the classic Q-iteration algorithms based on only one fuzzy inference system can be used for solving continuous action space problems, there still exist reasons for the slow convergence: for each iteration step in the learning process, there might exist a state-action pair that corresponds to different Q-values due to the structure of fuzzy inference systems (FIS)

  • In allusion to the problem that classic reinforcement learning algorithms based on lookup table or function approximation converge slowly and are difficult to obtain continuous action policies, this paper presents an algorithm with eligibility trace based on double-layer fuzzy reasoning—DFR-Sarsa(λ)

Read more

Summary

Introduction

Reinforcement learning is a kind of machine learning methods that gets the maximum cumulative rewards by interacting with the environment [1, 2]. Though the classic Q-iteration algorithms based on only one fuzzy inference system can be used for solving continuous action space problems, there still exist reasons for the slow convergence: for each iteration step in the learning process, there might exist a state-action pair that corresponds to different Q-values due to the structure of FIS. If the iteration step needs to use the Q-value of the mentioned stateaction pair to update the value function, the algorithm will select a Q-value randomly, since there are no criteria on how to choose the best one from different Q-values, which will influence the learning speed Because this situation may happen many times in the learning process, it will greatly slow down the convergence rate. Applying DFR-Sarsa(λ) and other algorithms to Mountain Car and Cart-pole Balancing problems, the results show that DFR-Sarsa(λ) can obtain a continuous action policy, and has a better convergence performance

Backgrounds
Experiments
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call