A basic formula for performance gradient estimation of semi-Markov decision processes

Yanjie Li,Fang Cao

doi:10.1016/j.ejor.2012.08.010

A basic formula for performance gradient estimation of semi-Markov decision processes

Yanjie Li, Fang Cao

https://doi.org/10.1016/j.ejor.2012.08.010

Copy DOI

Journal: European Journal of Operational Research	Publication Date: Sep 10, 2012
Citations: 12

Affiliation: Harbin Institute of Technology

#Gradient Estimation Algorithms #Single Sample Path + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature.

Full Text