Abstract

In this paper we are concerned with stochastic optimal control problems over an infinite horizon. When the reward is discounted there is only one optimality criterion and standard dynamic programming can be applied. Now, in the undiscounted case, the total reward might be unbounded for any feasible policy and there is not a unique optimahty criterion any more. Greatest average reward, overtaking optimality, average overtaking optimality and I-optimality are such criteria [4, lo]. The most used of them is the greatest average reward and several technics have been proposed to get an average optimal policy, at least for finite state space Markov decision processes (MDP) [I 1. These results are all based on the existence of a fixed point solution to the optimality equations for a relative value function. In the case of denumerable state space, weak conditions have been recently exhibited under which a fixed point solution exists [3]. In the case of finite state space non-homogeneous MDP the relative value function is also shown to converge under so-called weak ergodicity con- ditions [S]. While the latter assumptions are stronger, one can easily check them by inspection of the one-step transition probability matrices and also

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call