Average Reward Reinforcement Learning for Semi-Markov Decision Processes

Jiayuan Yang,Yanjie Li,Haoyao Chen,Jiangang Li

doi:10.1007/978-3-319-70087-8_79

Average Reward Reinforcement Learning for Semi-Markov Decision Processes

Jiayuan Yang, Yanjie Li + Show 2 more

https://doi.org/10.1007/978-3-319-70087-8_79

Copy DOI

Publication Date: Jan 1, 2017

Citations: 5

Affiliation: Harbin Institute of Technology

#Reinforcement Learning Algorithms #Stochastic Shortest Path + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In this paper, we study new reinforcement learning (RL) algorithms for Semi-Markov decision processes (SMDPs) with an average reward criterion. Based on the discrete-time type Bellman optimality equation, we use incremental value iteration (IVI), stochastic shortest path (SSP) value iteration and bisection algorithms to derive novel RL algorithms in a straightforward way. These algorithms use IVI, SSP and dichotomy to directly estimate the optimal average reward to solve the instability of average reward RL, respectively. Furthermore, a simulation experiment is used to compare the convergence among these algorithms.

Full Text