Optimal and suboptimal stationary controls for Markov chains

P Varaiya

doi:10.1109/tac.1978.1101742

Abstract

The problem studied is that of controlling a Markov chain so as to minimize the long run expected cost per unit time. Three results are obtained. First, a necessary and sufficient condition for optimality is given. The second gives for any strategy <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">u</tex> , an easily computable bound <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">B (u) \geq J(u)-J^{\ast}</tex> , where J <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> is the minimum cost. The third result consists of an algorithm which, starting with any strategy, successively generates alternative strategies so that the bound <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">B (u)</tex> decreases monotonically to zero.

Full Text