Kernel-based reinforcement learning in average-cost problems

D Ormoneit,P Glynn

doi:10.1109/tac.2002.803530

Abstract

Reinforcement learning (RL) is concerned with the identification of optimal controls in Markov decision processes (MDPs) where no explicit model of the transition probabilities is available. We propose a class of RL algorithms which always produces stable estimates of the value function. In detail, we use local averaging methods to construct an approximate dynamic programming (ADP) algorithm. Nearest-neighbor regression, grid-based approximations, and trees can all be used as the basis of this approximation. We provide a thorough theoretical analysis of this approach and we demonstrate that ADP converges to a unique approximation in continuous-state average-cost MDPs. In addition, we prove that our method is consistent in the sense that an optimal approximate strategy is identified asymptotically. With regard to a practical implementation, we suggest a reduction of ADP to standard dynamic programming in an artificial finite-state MDP.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Kernel-based reinforcement learning in average-cost problems

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Automatic Control

Lead the way for us

Journal: IEEE Transactions on Automatic Control	Publication Date: Oct 1, 2002
Citations: 85

Similar Papers

Energy Saving and Interference Coordination in HetNets Using Dynamic Programming and CEC
Jose A Ayala-Romero ... Javier Vales-Alonso
IEEE Access | VOL. 6
Jose A Ayala-Romero, et. al.Jose A Ayala-Romero ... Javier Vales-Alonso
01 Jan 2018
IEEE Access | VOL. 6

Semi-Markov adaptive critic heuristics with application to airline revenue management
Ketaki Kulkarni ... Katie Grantham
Journal of Control Theory and Applications | VOL. 9
Ketaki Kulkarni, et. al.Ketaki Kulkarni ... Katie Grantham
19 Jul 2011
Journal of Control Theory and Applications | VOL. 9

Use of Approximate Dynamic Programming for Production Optimization
Benjamin Van Roy ... Zheng Wen
-
Benjamin Van Roy, et. al.Benjamin Van Roy ... Zheng Wen
21 Feb 2011
21 Feb 2011

An approximate dynamic programming approach to the admission control of elective patients
Jian Zhang ... Abdellah El Moudni
Computers & Operations Research | VOL. 132
Jian Zhang, et. al.Jian Zhang ... Abdellah El Moudni
08 Mar 2021
Computers & Operations Research | VOL. 132

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Kernel-based reinforcement learning in average-cost problems

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Automatic Control