Abstract

Reinforcement learning (RL) is an efficient method for solving Markov decision processes (MDPs) without any priori knowledge about an environment. Q-learning is a representative RL. Though it is guaranteed to derive the optimal policy, Q-learning needs numerous trials to learn the optimal policy. By the use of the feature of Q value, this paper presents an accelerated RL method, the Q-ae learning. Further, utilizing the dynamic programming principle, this paper proves the convergence to the optimal policy of the Q-ae learning under deterministic MDPs. The analytical and simulation results illustrate the efficiencies of the Q-ae learning under deterministic and stochastic MDPs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call