Convergence of the Q-ae learning under deterministic MDPs and its efficiency under the stochastic environment

Ruoying Sun Ruoying Sun,Gang Zhao Gang Zhao,S. Tatsumi

doi:10.1109/icsmc.2000.884985

Convergence of the Q-ae learning under deterministic MDPs and its efficiency under the stochastic environment

Ruoying Sun Ruoying Sun, Gang Zhao Gang Zhao

https://doi.org/10.1109/icsmc.2000.884985

Copy DOI

Publication Date: Oct 8, 2000

Citations: 4

Affiliation: Fujitsu (Japan)

#Deterministic Markov Decision Processes #Markov Decision Processes + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Reinforcement learning (RL) is an efficient method for solving Markov decision processes (MDPs) without any priori knowledge about an environment. Q-learning is a representative RL. Though it is guaranteed to derive the optimal policy, Q-learning needs numerous trials to learn the optimal policy. By the use of the feature of Q value, this paper presents an accelerated RL method, the Q-ae learning. Further, utilizing the dynamic programming principle, this paper proves the convergence to the optimal policy of the Q-ae learning under deterministic MDPs. The analytical and simulation results illustrate the efficiencies of the Q-ae learning under deterministic and stochastic MDPs.

Full Text