Abstract

In the steady state of an undiscounted Markov decision process, we consider the problem of finding an optimal stationary probability distribution that minimizes the variance of the reward in a transition among the stationary probability distributions which give a mean not less than a specified value. The problem consists of a mathematical program with linear constraints and a non-linear objective. The solution technique replaces the non-linear part of the objective with a constant, inserts the constant as a constraint, and then parametrically analyzes the resulting linear program. Three numerical examples are discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call