Abstract

We study the optimization of average rewards of discrete time nonhomogeneous Markov chains, in which the state spaces, transition probabilities, and reward functions depend on time. The analysis encounters a few major difficulties: 1) Notions crucial to homogeneous Markov chains, such as ergodicity, stationarity, periodicity, and connectivity, no longer apply; 2) The average reward criterion is under-selective; i.e., it does not depend on the decisions in any finite period, and thus dynamic programming is not amenable; and 3) Because of the under-selectivity, an optimal average-reward policy may not be the best in any finite period.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call